[ 
https://issues.apache.org/jira/browse/SPARK-38557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17507976#comment-17507976
 ] 

Hyukjin Kwon commented on SPARK-38557:
--------------------------------------

For questions, we should better interact with Spark mailing list. Let's file an 
issue after having a discussion there first.

> What may be a cause for HDFSMetadataCommitter: Error while fetching MetaData 
> and how to fix or work around this?
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-38557
>                 URL: https://issues.apache.org/jira/browse/SPARK-38557
>             Project: Spark
>          Issue Type: Question
>          Components: Structured Streaming
>    Affects Versions: 3.1.1
>         Environment: Spark 3.1.1
> AWS EMR 6.3.0
> python 3.7.2
>            Reporter: Dmitry Goldenberg
>            Priority: Major
>
> I'm seeing errors such as the below when executing structured Spark Streaming 
> app which streams data from AWS Kinesis.
>  
> I've googled the error but can't tell what may be the cause. Is Spark running 
> out of disk space? something else?
> {code:java}
> // From the stderr log in EMR
> 22/03/15 00:54:00 WARN HDFSMetadataCommitter: Error while fetching MetaData 
> [attempt = 1]
> java.lang.IllegalStateException: 
> hdfs://ip-10-2-XXX-XXX.awsinternal.acme.com:8020/mnt/tmp/temporary-03b8fecf-32d5-422c-9375-4c3450ed0bb8/sources/0/shard-commit/0
>  does not exist
>     at 
> org.apache.spark.sql.kinesis.HDFSMetadataCommitter.$anonfun$get$1(HDFSMetadataCommitter.scala:163)
>     at 
> org.apache.spark.sql.kinesis.HDFSMetadataCommitter.withRetry(HDFSMetadataCommitter.scala:229)
>     at 
> org.apache.spark.sql.kinesis.HDFSMetadataCommitter.get(HDFSMetadataCommitter.scala:151)
>     at 
> org.apache.spark.sql.kinesis.KinesisSource.prevBatchShardInfo(KinesisSource.scala:275)
>     at 
> org.apache.spark.sql.kinesis.KinesisSource.getOffset(KinesisSource.scala:163)
>     at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$constructNextBatch$6(MicroBatchExecution.scala:399)
>     at 
> org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken(ProgressReporter.scala:357)
>     at 
> org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken$(ProgressReporter.scala:355)
>     at 
> org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:68)
>     at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$constructNextBatch$2(MicroBatchExecution.scala:399)
>     at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
>     at scala.collection.immutable.Map$Map1.foreach(Map.scala:128)
>     at scala.collection.TraversableLike.map(TraversableLike.scala:238)
>     at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
>     at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>     at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$constructNextBatch$1(MicroBatchExecution.scala:382)
>     at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
>     at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution.withProgressLocked(MicroBatchExecution.scala:613)
>     at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution.constructNextBatch(MicroBatchExecution.scala:378)
>     at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runActivatedStream$2(MicroBatchExecution.scala:211)
>     at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>     at 
> org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken(ProgressReporter.scala:357)
>     at 
> org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken$(ProgressReporter.scala:355)
>     at 
> org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:68)
>     at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runActivatedStream$1(MicroBatchExecution.scala:194)
>     at 
> org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:57)
>     at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:188)
>     at 
> org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:333)
>     at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:244){code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to