[ 
https://issues.apache.org/jira/browse/HDFS-9289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977260#comment-14977260
 ] 

Zhe Zhang commented on HDFS-9289:
---------------------------------

bq. I think there probabaly exist some cache coherence issue
This sounds possible. Maybe the {{DFSOutputStream}} thread uses a stale copy of 
{{block}} in {{completeFile}}, after {{block}} is updated by the 
{{DataStreamer}} thread.

bq. Then pipelineupdate happen with only d2 and d3 with new GS. Then file 
complete with old GS and d2 and d3 were marked corrupt.
Do you have any log showing that "replica marked as corrupt because its GS is 
newer than the block GS on NN"?

Regardless, making {{DataStreamer#block}} volatile is a good change. Ideally we 
should add a test to emulate the cache coherency problem but it doesn't look 
easy.

> check genStamp when complete file
> ---------------------------------
>
>                 Key: HDFS-9289
>                 URL: https://issues.apache.org/jira/browse/HDFS-9289
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Chang Li
>            Assignee: Chang Li
>            Priority: Critical
>         Attachments: HDFS-9289.1.patch, HDFS-9289.2.patch, HDFS-9289.3.patch
>
>
> we have seen a case of corrupt block which is caused by file complete after a 
> pipelineUpdate, but the file complete with the old block genStamp. This 
> caused the replicas of two datanodes in updated pipeline to be viewed as 
> corrupte. Propose to check genstamp when commit block



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to