[
https://issues.apache.org/jira/browse/HDFS-15779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hongbing Wang updated HDFS-15779:
---------------------------------
Description:
The NullPointerException in DN log as follows:
{code:java}
2020-12-28 15:49:25,453 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeCommand action: DNA_ERASURE_CODING_RECOVERY
//...
2020-12-28 15:51:25,551 WARN org.apache.hadoop.hdfs.server.datanode.DataNode:
Connection timed out
2020-12-28 15:51:25,553 WARN org.apache.hadoop.hdfs.server.datanode.DataNode:
Failed to reconstruct striped block:
BP-1922004198-10.83.xx.xx-1515033360950:blk_-9223372036804064064_6311920695
java.lang.NullPointerException
at
org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedWriter.clearBuffers(StripedWriter.java:299)
at
org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.clearBuffers(StripedBlockReconstructor.java:139)
at
org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.reconstruct(StripedBlockReconstructor.java:115)
at
org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.run(StripedBlockReconstructor.java:60)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2020-12-28 15:51:25,749 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
Receiving
BP-1922004198-10.83.xx.xx-1515033360950:blk_-9223372036799445643_6313197139
src: /10.83.xxx.52:53198 dest: /10.83.xxx.52:50
010
{code}
NPE occurs at `writer.getTargetBuffer()` in codes:
{code:java}
void clearBuffers() {
for (StripedBlockWriter writer : writers) {
ByteBuffer targetBuffer = writer.getTargetBuffer();
if (targetBuffer != null) {
targetBuffer.clear();
}
}
}
{code}
So, why is the writer null? Let's track when the writer is initialized and when
reconstruct() is called, as follows:
{code:java}
// StripedBlockReconstructor#run
public void run() {
try {
initDecoderIfNecessary();
getStripedReader().init();
stripedWriter.init(); //①
reconstruct(); //②
stripedWriter.endTargetBlocks();
} catch (Throwable e) {
LOG.warn("Failed to reconstruct striped block: {}", getBlockGroup(), e);
// ...{code}
They are called at ① and ② above respectively. `stripedWriter.init()` ->
`initTargetStreams()`, as follows:
and `writers[i] = createWriter(i)`
`
was:
The NullPointerException in DN log as follows:
{code:java}
2020-12-28 15:49:25,453 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeCommand action: DNA_ERASURE_CODING_RECOVERY
//...
2020-12-28 15:51:25,551 WARN org.apache.hadoop.hdfs.server.datanode.DataNode:
Connection timed out
2020-12-28 15:51:25,553 WARN org.apache.hadoop.hdfs.server.datanode.DataNode:
Failed to reconstruct striped block:
BP-1922004198-10.83.xx.xx-1515033360950:blk_-9223372036804064064_6311920695
java.lang.NullPointerException
at
org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedWriter.clearBuffers(StripedWriter.java:299)
at
org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.clearBuffers(StripedBlockReconstructor.java:139)
at
org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.reconstruct(StripedBlockReconstructor.java:115)
at
org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.run(StripedBlockReconstructor.java:60)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2020-12-28 15:51:25,749 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
Receiving
BP-1922004198-10.83.xx.xx-1515033360950:blk_-9223372036799445643_6313197139
src: /10.83.xxx.52:53198 dest: /10.83.xxx.52:50
010
{code}
NPE occurs in writer.getTargetBuffer();
> EC: fix NPE caused by StripedWriter.clearBuffers during reconstruct block
> -------------------------------------------------------------------------
>
> Key: HDFS-15779
> URL: https://issues.apache.org/jira/browse/HDFS-15779
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 3.2.0
> Reporter: Hongbing Wang
> Assignee: Hongbing Wang
> Priority: Major
>
> The NullPointerException in DN log as follows:
> {code:java}
> 2020-12-28 15:49:25,453 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeCommand action: DNA_ERASURE_CODING_RECOVERY
> //...
> 2020-12-28 15:51:25,551 WARN org.apache.hadoop.hdfs.server.datanode.DataNode:
> Connection timed out
> 2020-12-28 15:51:25,553 WARN org.apache.hadoop.hdfs.server.datanode.DataNode:
> Failed to reconstruct striped block:
> BP-1922004198-10.83.xx.xx-1515033360950:blk_-9223372036804064064_6311920695
> java.lang.NullPointerException
> at
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedWriter.clearBuffers(StripedWriter.java:299)
> at
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.clearBuffers(StripedBlockReconstructor.java:139)
> at
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.reconstruct(StripedBlockReconstructor.java:115)
> at
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.run(StripedBlockReconstructor.java:60)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 2020-12-28 15:51:25,749 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
> Receiving
> BP-1922004198-10.83.xx.xx-1515033360950:blk_-9223372036799445643_6313197139
> src: /10.83.xxx.52:53198 dest: /10.83.xxx.52:50
> 010
> {code}
> NPE occurs at `writer.getTargetBuffer()` in codes:
> {code:java}
> void clearBuffers() {
> for (StripedBlockWriter writer : writers) {
> ByteBuffer targetBuffer = writer.getTargetBuffer();
> if (targetBuffer != null) {
> targetBuffer.clear();
> }
> }
> }
> {code}
> So, why is the writer null? Let's track when the writer is initialized and
> when reconstruct() is called, as follows:
> {code:java}
> // StripedBlockReconstructor#run
> public void run() {
> try {
> initDecoderIfNecessary();
> getStripedReader().init();
> stripedWriter.init(); //①
> reconstruct(); //②
> stripedWriter.endTargetBlocks();
> } catch (Throwable e) {
> LOG.warn("Failed to reconstruct striped block: {}", getBlockGroup(), e);
> // ...{code}
> They are called at ① and ② above respectively. `stripedWriter.init()` ->
> `initTargetStreams()`, as follows:
>
> and `writers[i] = createWriter(i)`
> `
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]