[ 
https://issues.apache.org/jira/browse/HDFS-10586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15399467#comment-15399467
 ] 

Rakesh R commented on HDFS-10586:
---------------------------------

[~gaoshbj], could you please analyse the client and datanode logs to check the 
possibility of poor network or datanode unreachable cases which can result in 
more than parity number of datanode read failures. FYI, recently HDFS-10697 
issue discussed one such case.

> Erasure Code misfunctions when 3 DataNode down
> ----------------------------------------------
>
>                 Key: HDFS-10586
>                 URL: https://issues.apache.org/jira/browse/HDFS-10586
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: erasure-coding
>    Affects Versions: 3.0.0-alpha1
>         Environment: 9 DataNode and 1 NameNode,    Erasured code policy is 
> set as "6--3",   When 3 DataNode down,  erasured code fails and an exception 
> is thrown
>            Reporter: gao shan
>
> The following is the steps to reproduce:
> 1) hadoop fs -mkdir /ec
> 2) set erasured code policy as "6-3"
> 3) "write" data by : 
> time hadoop jar 
> /opt/hadoop/hadoop-3.0.0-SNAPSHOT/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.0.0-SNAPSHOT.jar
>   TestDFSIO -D test.build.data=/ec -write -nrFiles 30 -fileSize 12288 
> -bufferSize 1073741824
> 4) Manually down 3 nodes.  Kill the threads of  "datanode" and "nodemanager" 
> in 3 DataNode.
> 5) By using erasured code to "read" data by:
> time hadoop jar 
> /opt/hadoop/hadoop-3.0.0-SNAPSHOT/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.0.0-SNAPSHOT.jar
>   TestDFSIO -D test.build.data=/ec -read -nrFiles 30 -fileSize 12288 
> -bufferSize 1073741824
> then the failure occurs and the exception is thrown as:
> INFO mapreduce.Job: Task Id : attempt_1465445965249_0008_m_000034_2, Status : 
> FAILED
> Error: java.io.IOException: 4 missing blocks, the stripe is: Offset=0, 
> length=8388608, fetchedChunksNum=0, missingChunksNum=4
>       at 
> org.apache.hadoop.hdfs.DFSStripedInputStream$StripeReader.checkMissingBlocks(DFSStripedInputStream.java:614)
>       at 
> org.apache.hadoop.hdfs.DFSStripedInputStream$StripeReader.readParityChunks(DFSStripedInputStream.java:647)
>       at 
> org.apache.hadoop.hdfs.DFSStripedInputStream$StripeReader.readStripe(DFSStripedInputStream.java:762)
>       at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:316)
>       at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:450)
>       at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:941)
>       at java.io.DataInputStream.read(DataInputStream.java:149)
>       at org.apache.hadoop.fs.TestDFSIO$ReadMapper.doIO(TestDFSIO.java:531)
>       at org.apache.hadoop.fs.TestDFSIO$ReadMapper.doIO(TestDFSIO.java:508)
>       at org.apache.hadoop.fs.IOMapperBase.map(IOMapperBase.java:134)
>       at org.apache.hadoop.fs.IOMapperBase.map(IOMapperBase.java:37)
>       at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
>       at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>       at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
>       at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to