[ 
https://issues.apache.org/jira/browse/SPARK-26261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16710883#comment-16710883
 ] 

Jialin LIu commented on SPARK-26261:
------------------------------------

Our initial test is:

We start a word count workflow including persisting blocks to disk. After we 
make sure that there are some blocks on the disk already, we use the truncate 
command to truncate part of the block. We compare the result with the result 
produced by workflow without fault injection. 

> Spark does not check completeness temporary file 
> -------------------------------------------------
>
>                 Key: SPARK-26261
>                 URL: https://issues.apache.org/jira/browse/SPARK-26261
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.3.2
>            Reporter: Jialin LIu
>            Priority: Minor
>
> Spark does not check temporary files' completeness. When persisting to disk 
> is enabled on some RDDs, a bunch of temporary files will be created on 
> blockmgr folder. Block manager is able to detect missing blocks while it is 
> not able detect file content being modified during execution. 
> Our initial test shows that if we truncate the block file before being used 
> by executors, the program will finish without detecting any error, but the 
> result content is totally wrong.
> We believe there should be a file checksum on every RDD file block and these 
> files should be protected by checksum.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to