[ 
https://issues.apache.org/jira/browse/HDFS-8323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14539696#comment-14539696
 ] 

Walter Su commented on HDFS-8323:
---------------------------------

1. {{DFSStripedOutputStream}} owns the streamer lock when call 
{{setExternalError}}. But streamer Thread itself set {{hasError=false}} without 
synchronized keyword.

2. It's hard to describe. Firstly look at the code below.
{code:title=BlockManager.java}
3946   public LocatedBlock newLocatedBlock(ExtendedBlock eb, BlockInfo info,
3947       DatanodeStorageInfo[] locs, long offset) throws IOException {
3948     final LocatedBlock lb;
3949     if (info.isStriped()) {
3950       lb = newLocatedStripedBlock(eb, locs,
3951           ((BlockInfoStripedUnderConstruction)info).getBlockIndices(),
3952           offset, false);
3953     } else {
3954       lb = newLocatedBlock(eb, locs, offset, false);
3955     }
3956     setBlockToken(lb, BlockTokenIdentifier.AccessMode.WRITE);
3957     return lb;
3958   }
{code}
The returned indices is from 
{{BlockInfoStripedUnderConstruction.getBlockIndices()}} whose length is 
depended upon blockreport, and is dynamic. It could be problematic.
Consider the following situations:
*Situation A*
1. create 9 streamers. 
2. streamer #5 failed in the first place. 
3. Since #5 didn't connect to some DN. So when file is created, only 8 UC 
blocks created.
4. last BlockInfoStripedUnderConstruction has 8 replicas reported.
5. #0 leading streamer recover the pipeline, get a new locatedBlock.
6. StripedBlockUtil.parseStripedBlockGroup(..) create locatedBlock\[9\].
7. locatedBlock\[5\] == null
8. #0 leading streamer encounter NPE
{noformat}
2015-05-12 18:47:11,117 WARN  hdfs.DataStreamer (DataStreamer.java:run(572)) - 
DataStreamer Exception
java.lang.NullPointerException
    at 
java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:410)
    at 
org.apache.hadoop.hdfs.DFSStripedOutputStream$MultipleBlockingQueue.offer(DFSStripedOutputStream.java:73)
    at 
org.apache.hadoop.hdfs.DFSStripedOutputStream$Coordinator.putStripedBlock(DFSStripedOutputStream.java:133)
    at 
org.apache.hadoop.hdfs.StripedDataStreamer.putLoactedBlocks(StripedDataStreamer.java:129)
    at 
org.apache.hadoop.hdfs.StripedDataStreamer.updateBlockForPipeline(StripedDataStreamer.java:136)
    at 
org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1291)
    at 
org.apache.hadoop.hdfs.DataStreamer.processDatanodeError(DataStreamer.java:1022)
{noformat}
*Situation B*
2 DN failed in a row. Same problem with situation A. 

btw. I saw some merging conflicts with HDFS-8220. Could you take a look at it 
and see if the changes is ok with you?

> Bump GenerationStamp for write faliure in DFSStripedOutputStream
> ----------------------------------------------------------------
>
>                 Key: HDFS-8323
>                 URL: https://issues.apache.org/jira/browse/HDFS-8323
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Tsz Wo Nicholas Sze
>            Assignee: Tsz Wo Nicholas Sze
>         Attachments: h8323_20150511.patch, h8323_20150511b.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to