[
https://issues.apache.org/jira/browse/SPARK-19013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15799881#comment-15799881
]
Tim Chan commented on SPARK-19013:
----------------------------------
[~zsxwing]
{code}
Error:
java.util.ConcurrentModificationException: Multiple HDFSMetadataLog are using
s3://lumos-emr-logs/streaming-insights-ebb-and-flow-speed-accuracy/offsets
at
org.apache.spark.sql.execution.streaming.HDFSMetadataLog.org$apache$spark$sql$execution$streaming$HDFSMetadataLog$$writeBatch(HDFSMetadataLog.scala:162)
at
org.apache.spark.sql.execution.streaming.HDFSMetadataLog$$anonfun$add$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(HDFSMetadataLog.scala:119)
at
org.apache.spark.sql.execution.streaming.HDFSMetadataLog$$anonfun$add$1$$anonfun$apply$mcZ$sp$1.apply(HDFSMetadataLog.scala:119)
at
org.apache.spark.sql.execution.streaming.HDFSMetadataLog$$anonfun$add$1$$anonfun$apply$mcZ$sp$1.apply(HDFSMetadataLog.scala:119)
at
org.apache.spark.util.UninterruptibleThread.runUninterruptibly(UninterruptibleThread.scala:79)
at
org.apache.spark.sql.execution.streaming.HDFSMetadataLog$$anonfun$add$1.apply$mcZ$sp(HDFSMetadataLog.scala:119)
at
org.apache.spark.sql.execution.streaming.HDFSMetadataLog$$anonfun$add$1.apply(HDFSMetadataLog.scala:115)
at
org.apache.spark.sql.execution.streaming.HDFSMetadataLog$$anonfun$add$1.apply(HDFSMetadataLog.scala:115)
at scala.Option.getOrElse(Option.scala:121)
at
org.apache.spark.sql.execution.streaming.HDFSMetadataLog.add(HDFSMetadataLog.scala:115)
at
org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$constructNextBatch$1.apply$mcV$sp(StreamExecution.scala:346)
at
org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$constructNextBatch$1.apply(StreamExecution.scala:345)
at
org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$constructNextBatch$1.apply(StreamExecution.scala:345)
at
org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$reportTimeTaken(StreamExecution.scala:656)
at
org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$constructNextBatch(StreamExecution.scala:345)
at
org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$1.apply$mcZ$sp(StreamExecution.scala:219)
at
org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$1.apply(StreamExecution.scala:213)
at
org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$1.apply(StreamExecution.scala:213)
at
org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$reportTimeTaken(StreamExecution.scala:656)
at
org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1.apply$mcZ$sp(StreamExecution.scala:212)
at
org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:43)
at
org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches(StreamExecution.scala:208)
at
org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:142)
Caused by: java.io.FileNotFoundException: No such file or directory
's3://lumos-emr-logs/streaming-insights-ebb-and-flow-speed-accuracy/offsets/.45b98c69-6158-4434-a7b2-c3f73d27294e.tmp'
at
com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:812)
at
org.apache.hadoop.fs.FileSystem.getFileLinkStatus(FileSystem.java:2286)
at
com.amazon.ws.emr.hadoop.fs.EmrFileSystem.getFileLinkStatus(EmrFileSystem.java:521)
at
org.apache.hadoop.fs.DelegateToFileSystem.getFileLinkStatus(DelegateToFileSystem.java:130)
at
org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:705)
at
org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:678)
at org.apache.hadoop.fs.FileContext.rename(FileContext.java:958)
at
org.apache.spark.sql.execution.streaming.HDFSMetadataLog$FileContextManager.rename(HDFSMetadataLog.scala:309)
at
org.apache.spark.sql.execution.streaming.HDFSMetadataLog.org$apache$spark$sql$execution$streaming$HDFSMetadataLog$$writeBatch(HDFSMetadataLog.scala:150)
... 22 more
ApplicationMaster host: 172.16.0.177
ApplicationMaster RPC port: 0
queue: default
start time: 1483390859959
final status: FAILED
tracking URL:
http://ip-172-16-0-176.ec2.internal:20888/proxy/application_1482466545028_0014/
user: hadoop
Exception in thread "main" org.apache.spark.SparkException: Application
application_1482466545028_0014 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1132)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1178)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736)
at
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
{code}
I've verified that this error does not occur when I use an HDFS path for my
checkpoint location.
> java.util.ConcurrentModificationException when using s3 path as
> checkpointLocation
> -----------------------------------------------------------------------------------
>
> Key: SPARK-19013
> URL: https://issues.apache.org/jira/browse/SPARK-19013
> Project: Spark
> Issue Type: Bug
> Components: Structured Streaming
> Affects Versions: 2.0.2
> Reporter: Tim Chan
>
> I have a structured stream job running on EMR. The job will fail due to this
> {code}
> Multiple HDFSMetadataLog are using s3://mybucket/myapp
> org.apache.spark.sql.execution.streaming.HDFSMetadataLog.org$apache$spark$sql$execution$streaming$HDFSMetadataLog$$writeBatch(HDFSMetadataLog.scala:162)
> {code}
> There is only one instance of this stream job running.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]