[ https://issues.apache.org/jira/browse/HDFS-4688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629746#comment-13629746 ]
Fengdong Yu commented on HDFS-4688: ----------------------------------- Hi Harsh, I do misunderstood. the attached program illustrated everything now. so I attached a minor patch for this issue. > DFSClient should not allow multiple concurrent creates for the same file > ------------------------------------------------------------------------ > > Key: HDFS-4688 > URL: https://issues.apache.org/jira/browse/HDFS-4688 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 3.0.0, 2.0.3-alpha > Reporter: Andrew Wang > Assignee: Andrew Wang > Attachments: HDFS-4688.txt, TestBadFileMaker.java > > > Credit to Harsh for tracing down most of this. > If a DFSClient does create with overwrite multiple times on the same file, we > can get into bad states. The exact failure mode depends on the state of the > file, but at the least one DFSOutputStream will "win" over the others, > leading to data loss in the sense that data written to the other > DFSOutputStreams will be lost. While this is perhaps okay because of > overwrite semantics, we've also seen other cases where the DFSClient loops > indefinitely on close and blocks get marked as corrupt. This is not okay. > One fix for this is adding some locking to DFSClient which prevents a user > from opening multiple concurrent output streams to the same path. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira