[
https://issues.apache.org/jira/browse/HDFS-8323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14539696#comment-14539696
]
Walter Su commented on HDFS-8323:
---------------------------------
1. {{DFSStripedOutputStream}} owns the streamer lock when call
{{setExternalError}}. But streamer Thread itself set {{hasError=false}} without
synchronized keyword.
2. It's hard to describe. Firstly look at the code below.
{code:title=BlockManager.java}
3946 public LocatedBlock newLocatedBlock(ExtendedBlock eb, BlockInfo info,
3947 DatanodeStorageInfo[] locs, long offset) throws IOException {
3948 final LocatedBlock lb;
3949 if (info.isStriped()) {
3950 lb = newLocatedStripedBlock(eb, locs,
3951 ((BlockInfoStripedUnderConstruction)info).getBlockIndices(),
3952 offset, false);
3953 } else {
3954 lb = newLocatedBlock(eb, locs, offset, false);
3955 }
3956 setBlockToken(lb, BlockTokenIdentifier.AccessMode.WRITE);
3957 return lb;
3958 }
{code}
The returned indices is from
{{BlockInfoStripedUnderConstruction.getBlockIndices()}} whose length is
depended upon blockreport, and is dynamic. It could be problematic.
Consider the following situations:
*Situation A*
1. create 9 streamers.
2. streamer #5 failed in the first place.
3. Since #5 didn't connect to some DN. So when file is created, only 8 UC
blocks created.
4. last BlockInfoStripedUnderConstruction has 8 replicas reported.
5. #0 leading streamer recover the pipeline, get a new locatedBlock.
6. StripedBlockUtil.parseStripedBlockGroup(..) create locatedBlock\[9\].
7. locatedBlock\[5\] == null
8. #0 leading streamer encounter NPE
{noformat}
2015-05-12 18:47:11,117 WARN hdfs.DataStreamer (DataStreamer.java:run(572)) -
DataStreamer Exception
java.lang.NullPointerException
at
java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:410)
at
org.apache.hadoop.hdfs.DFSStripedOutputStream$MultipleBlockingQueue.offer(DFSStripedOutputStream.java:73)
at
org.apache.hadoop.hdfs.DFSStripedOutputStream$Coordinator.putStripedBlock(DFSStripedOutputStream.java:133)
at
org.apache.hadoop.hdfs.StripedDataStreamer.putLoactedBlocks(StripedDataStreamer.java:129)
at
org.apache.hadoop.hdfs.StripedDataStreamer.updateBlockForPipeline(StripedDataStreamer.java:136)
at
org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1291)
at
org.apache.hadoop.hdfs.DataStreamer.processDatanodeError(DataStreamer.java:1022)
{noformat}
*Situation B*
2 DN failed in a row. Same problem with situation A.
btw. I saw some merging conflicts with HDFS-8220. Could you take a look at it
and see if the changes is ok with you?
> Bump GenerationStamp for write faliure in DFSStripedOutputStream
> ----------------------------------------------------------------
>
> Key: HDFS-8323
> URL: https://issues.apache.org/jira/browse/HDFS-8323
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Tsz Wo Nicholas Sze
> Assignee: Tsz Wo Nicholas Sze
> Attachments: h8323_20150511.patch, h8323_20150511b.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)