[
https://issues.apache.org/jira/browse/HDFS-9289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977260#comment-14977260
]
Zhe Zhang commented on HDFS-9289:
---------------------------------
bq. I think there probabaly exist some cache coherence issue
This sounds possible. Maybe the {{DFSOutputStream}} thread uses a stale copy of
{{block}} in {{completeFile}}, after {{block}} is updated by the
{{DataStreamer}} thread.
bq. Then pipelineupdate happen with only d2 and d3 with new GS. Then file
complete with old GS and d2 and d3 were marked corrupt.
Do you have any log showing that "replica marked as corrupt because its GS is
newer than the block GS on NN"?
Regardless, making {{DataStreamer#block}} volatile is a good change. Ideally we
should add a test to emulate the cache coherency problem but it doesn't look
easy.
> check genStamp when complete file
> ---------------------------------
>
> Key: HDFS-9289
> URL: https://issues.apache.org/jira/browse/HDFS-9289
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Chang Li
> Assignee: Chang Li
> Priority: Critical
> Attachments: HDFS-9289.1.patch, HDFS-9289.2.patch, HDFS-9289.3.patch
>
>
> we have seen a case of corrupt block which is caused by file complete after a
> pipelineUpdate, but the file complete with the old block genStamp. This
> caused the replicas of two datanodes in updated pipeline to be viewed as
> corrupte. Propose to check genstamp when commit block
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)