[ 
https://issues.apache.org/jira/browse/SPARK-22752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16287305#comment-16287305
 ] 

Marco Gaido commented on SPARK-22752:
-------------------------------------

Hi [~zsxwing], thanks for looking at this. The checkpointDir is on HDFS. Inside 
it we have:
{noformat}
metadata . 0.1 kB
commits
offsets
sources
state
{noformat}
where {{commits}} contains files from {{0}} to {{13}}, {{offsets}} from {{0}} 
to {{14}}, {{sources}} contains only a file which is {{0/0}} and {{state}} 
contains only the directory {{0/0}} which is empty.
Do you need other information? Have you idea of which is the root cause of the 
issue? Thanks.

> FileNotFoundException while reading from Kafka
> ----------------------------------------------
>
>                 Key: SPARK-22752
>                 URL: https://issues.apache.org/jira/browse/SPARK-22752
>             Project: Spark
>          Issue Type: Bug
>          Components: Structured Streaming
>    Affects Versions: 2.2.0
>            Reporter: Marco Gaido
>
> We are running a stateful structured streaming job which reads from Kafka and 
> writes to HDFS. And we are hitting this exception:
> {noformat}
> 17/12/08 05:20:12 ERROR FileFormatWriter: Aborting job null.
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 
> (TID 4, hcube1-1n03.eng.hortonworks.com, executor 1): 
> java.lang.IllegalStateException: Error reading delta file 
> /checkpointDir/state/0/0/1.delta of HDFSStateStoreProvider[id = (op=0, 
> part=0), dir = /checkpointDir/state/0/0]: /checkpointDir/state/0/0/1.delta 
> does not exist
>       at 
> org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$updateFromDeltaFile(HDFSBackedStateStoreProvider.scala:410)
>       at 
> org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$$anonfun$org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap$1$$anonfun$6.apply(HDFSBackedStateStoreProvider.scala:362)
>       at 
> org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$$anonfun$org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap$1$$anonfun$6.apply(HDFSBackedStateStoreProvider.scala:359)
>       at scala.Option.getOrElse(Option.scala:121)
>       at 
> org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$$anonfun$org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap$1.apply(HDFSBackedStateStoreProvider.scala:359)
>       at 
> org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$$anonfun$org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap$1.apply(HDFSBackedStateStoreProvider.scala:358)
>       at scala.Option.getOrElse(Option.scala:121)
>       at 
> org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap(HDFSBackedStateStoreProvider.scala:358)
>       at 
> org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$$anonfun$org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap$1$$anonfun$6.apply(HDFSBackedStateStoreProvider.scala:360)
>       at 
> org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$$anonfun$org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap$1$$anonfun$6.apply(HDFSBackedStateStoreProvider.scala:359)
>       at scala.Option.getOrElse(Option.scala:121)
> {noformat}
> Of course, the file doesn't exist in HDFS. And in the {{state/0/0}} directory 
> there is no file at all. While we have some files in the commits and offsets 
> folders. I am not sure about the reason of this behavior. It seems to happen 
> on the second time the job is started, after the first one failed. So it 
> looks like task failures can generate it. Or it might be related to 
> watermarks, since there are some problems related to the incoming data for 
> which the watermark was filtering all the incoming data.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to