[ 
https://issues.apache.org/jira/browse/SPARK-5142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14286387#comment-14286387
 ] 

Tathagata Das commented on SPARK-5142:
--------------------------------------

This is definitely a tricky issue. One thing we could try is stop the current 
log file, open a new log file and then try again. The data may have got written 
to the previous log file but it does not matter because if the second attempt 
works, the metadata will have reference to the segment in the new log file. The 
data in the old log file can persist around, does not matter too much.

> Possibly data may be ruined in Spark Streaming's WAL mechanism.
> ---------------------------------------------------------------
>
>                 Key: SPARK-5142
>                 URL: https://issues.apache.org/jira/browse/SPARK-5142
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Streaming
>    Affects Versions: 1.2.0
>            Reporter: Saisai Shao
>
> Currently in Spark Streaming's WAL manager, data will be written into HDFS 
> with multiple tries when meeting failure, because of lacking of transactional 
> guarantee, previously partial-written data is not rolled back and the retried 
> data will be appended to the last, this will ruin the file and make the 
> WriteAheadLogReader to read data with failure.
> Firstly I think this problem is hard to fix because HDFS do not support 
> truncate operation(HDFS-3107) or random write with specific offset.
> Secondly, I think if we meet such write exception, it is better not to try 
> again, try again will ruin the file and make read abnormal.
> Sorry if I misunderstand anything.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to