Andrew Wang created HDFS-4688:
---------------------------------
Summary: DFSClient should not allow multiple concurrent creates
for the same file
Key: HDFS-4688
URL: https://issues.apache.org/jira/browse/HDFS-4688
Project: Hadoop HDFS
Issue Type: Bug
Affects Versions: 2.0.3-alpha, 3.0.0
Reporter: Andrew Wang
Assignee: Andrew Wang
Credit to Harsh for tracing down most of this.
If a DFSClient does create with overwrite multiple times on the same file, we
can get into bad states. The exact failure mode depends on the state of the
file, but at the least one DFSOutputStream will "win" over the others, leading
to data loss in the sense that data written to the other DFSOutputStreams will
be lost. While this is perhaps okay because of overwrite semantics, we've also
seen other cases where the DFSClient loops indefinitely on close and blocks get
marked as corrupt. This is not okay.
One fix for this is adding some locking to DFSClient which prevents a user from
opening multiple concurrent output streams to the same path.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira