[
https://issues.apache.org/jira/browse/HDFS-15779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17276813#comment-17276813
]
Hui Fei commented on HDFS-15779:
--------------------------------
[~wanghongbing]Thanks, it makes sense, +1
Minor comments
* It's better to add blank near operator for code style.
> EC: fix NPE caused by StripedWriter.clearBuffers during reconstruct block
> -------------------------------------------------------------------------
>
> Key: HDFS-15779
> URL: https://issues.apache.org/jira/browse/HDFS-15779
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 3.2.0
> Reporter: Hongbing Wang
> Assignee: Hongbing Wang
> Priority: Major
> Attachments: HDFS-15779.001.patch
>
>
> The NullPointerException in DN log as follows:
> {code:java}
> 2020-12-28 15:49:25,453 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeCommand action: DNA_ERASURE_CODING_RECOVERY
> //...
> 2020-12-28 15:51:25,551 WARN org.apache.hadoop.hdfs.server.datanode.DataNode:
> Connection timed out
> 2020-12-28 15:51:25,553 WARN org.apache.hadoop.hdfs.server.datanode.DataNode:
> Failed to reconstruct striped block:
> BP-1922004198-10.83.xx.xx-1515033360950:blk_-9223372036804064064_6311920695
> java.lang.NullPointerException
> at
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedWriter.clearBuffers(StripedWriter.java:299)
> at
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.clearBuffers(StripedBlockReconstructor.java:139)
> at
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.reconstruct(StripedBlockReconstructor.java:115)
> at
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.run(StripedBlockReconstructor.java:60)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 2020-12-28 15:51:25,749 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
> Receiving
> BP-1922004198-10.83.xx.xx-1515033360950:blk_-9223372036799445643_6313197139
> src: /10.83.xxx.52:53198 dest: /10.83.xxx.52:50
> 010
> {code}
> NPE occurs at `writer.getTargetBuffer()` in codes:
> {code:java}
> // StripedWriter#clearBuffers
> void clearBuffers() {
> for (StripedBlockWriter writer : writers) {
> ByteBuffer targetBuffer = writer.getTargetBuffer();
> if (targetBuffer != null) {
> targetBuffer.clear();
> }
> }
> }
> {code}
> So, why is the writer null? Let's track when the writer is initialized and
> when reconstruct() is called, as follows:
> {code:java}
> // StripedBlockReconstructor#run
> public void run() {
> try {
> initDecoderIfNecessary();
> getStripedReader().init();
> stripedWriter.init(); //①
> reconstruct(); //②
> stripedWriter.endTargetBlocks();
> } catch (Throwable e) {
> LOG.warn("Failed to reconstruct striped block: {}", getBlockGroup(), e);
> // ...{code}
> They are called at ① and ② above respectively. `stripedWriter.init()` ->
> `initTargetStreams()`, as follows:
> {code:java}
> // StripedWriter#initTargetStreams
> int initTargetStreams() {
> int nSuccess = 0;
> for (short i = 0; i < targets.length; i++) {
> try {
> writers[i] = createWriter(i);
> nSuccess++;
> targetsStatus[i] = true;
> } catch (Throwable e) {
> LOG.warn(e.getMessage());
> }
> }
> return nSuccess;
> }
> {code}
> NPE occurs when createWriter() gets an exception and 0 < nSuccess <
> targets.length.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]