[ 
https://issues.apache.org/jira/browse/HDFS-15779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267024#comment-17267024
 ] 

Hongbing Wang edited comment on HDFS-15779 at 1/18/21, 5:26 AM:
----------------------------------------------------------------

I have two issues to discuss:
 * Does it throw an exception only when `initTargetStreams() == 0` instead of  
`< targets.length` ?

{code:java}
// StripedWriter#init
if (initTargetStreams() == 0) {
  String error = "All targets are failed.";
  throw new IOException(error);
}{code}
 * Is it the best change to just judge whether the writer is null?

{code:java}
// StripedWriter#clearBuffers
void clearBuffers() {
  for (StripedBlockWriter writer : writers) {
    ByteBuffer targetBuffer = writer.getTargetBuffer();
    if (targetBuffer != null) {
      targetBuffer.clear();
    }
  }
}
{code}


was (Author: wanghongbing):
I have two issues to discuss:
 # Does it throw an exception only when `initTargetStreams() == 0` instead of  
`< targets.length` ?

{code:java}
// StripedWriter#init
if (initTargetStreams() == 0) {
  String error = "All targets are failed.";
  throw new IOException(error);
}{code}

 # Is it the best change to just judge whether the writer is null?

{code:java}
// StripedWriter#clearBuffers
void clearBuffers() {
  for (StripedBlockWriter writer : writers) {
    ByteBuffer targetBuffer = writer.getTargetBuffer();
    if (targetBuffer != null) {
      targetBuffer.clear();
    }
  }
}
{code}

> EC: fix NPE caused by StripedWriter.clearBuffers during reconstruct block
> -------------------------------------------------------------------------
>
>                 Key: HDFS-15779
>                 URL: https://issues.apache.org/jira/browse/HDFS-15779
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 3.2.0
>            Reporter: Hongbing Wang
>            Assignee: Hongbing Wang
>            Priority: Major
>
> The NullPointerException in DN log as follows: 
> {code:java}
> 2020-12-28 15:49:25,453 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DatanodeCommand action: DNA_ERASURE_CODING_RECOVERY
> //...
> 2020-12-28 15:51:25,551 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Connection timed out
> 2020-12-28 15:51:25,553 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Failed to reconstruct striped block: 
> BP-1922004198-10.83.xx.xx-1515033360950:blk_-9223372036804064064_6311920695
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedWriter.clearBuffers(StripedWriter.java:299)
>         at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.clearBuffers(StripedBlockReconstructor.java:139)
>         at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.reconstruct(StripedBlockReconstructor.java:115)
>         at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.run(StripedBlockReconstructor.java:60)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> 2020-12-28 15:51:25,749 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Receiving 
> BP-1922004198-10.83.xx.xx-1515033360950:blk_-9223372036799445643_6313197139 
> src: /10.83.xxx.52:53198 dest: /10.83.xxx.52:50
> 010
> {code}
> NPE occurs at `writer.getTargetBuffer()` in codes:
> {code:java}
> // StripedWriter#clearBuffers
> void clearBuffers() {
>   for (StripedBlockWriter writer : writers) {
>     ByteBuffer targetBuffer = writer.getTargetBuffer();
>     if (targetBuffer != null) {
>       targetBuffer.clear();
>     }
>   }
> }
> {code}
> So, why is the writer null? Let's track when the writer is initialized and 
> when reconstruct() is called,  as follows:
> {code:java}
> // StripedBlockReconstructor#run
> public void run() {
>   try {
>     initDecoderIfNecessary();
>     getStripedReader().init();
>     stripedWriter.init();  //①
>     reconstruct();  //②
>     stripedWriter.endTargetBlocks();
>   } catch (Throwable e) {
>     LOG.warn("Failed to reconstruct striped block: {}", getBlockGroup(), e);
>     // ...{code}
> They are called at ① and ② above respectively. `stripedWriter.init()` -> 
> `initTargetStreams()`, as follows:
> {code:java}
> // StripedWriter#initTargetStreams
> int initTargetStreams() {
>   int nSuccess = 0;
>   for (short i = 0; i < targets.length; i++) {
>     try {
>       writers[i] = createWriter(i);
>       nSuccess++;
>       targetsStatus[i] = true;
>     } catch (Throwable e) {
>       LOG.warn(e.getMessage());
>     }
>   }
>   return nSuccess;
> }
> {code}
> NPE occurs when createWriter() gets an exception and  0 < nSuccess < 
> targets.length. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to