[jira] [Commented] (HDFS-3332) NullPointerException in DN when directoryscanner is trying to report bad blocks

amith (JIRA) Mon, 30 Apr 2012 03:31:17 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13264867#comment-13264867
 ]


amith commented on HDFS-3332:
-----------------------------

Hi Nicholas,
Please correct me if I am wrong :)

I have NN started with HA configuration(nn1=40.95 and nn2=40.96 nn2 not 
started).

I have started only 1 NN and made it as active, wrote a file and corrupted it 
manually.
Directory scanner is reporting the bad block to all the NN via BPServiceActor.

Here BPServiceActor#reportBadBlocks(ExtendedBlock block) will not check whether 
DN is correctly registered to NN.
We are trying to report bad blocks using bpRegistration(which is null) causing 
NPE.
{code}
 void reportBadBlocks(ExtendedBlock block) {
    DatanodeInfo[] dnArr = { new DatanodeInfo(bpRegistration) };
    LocatedBlock[] blocks = { new LocatedBlock(block, dnArr) }; 
{code}    


Why bpRegistration is null?

{code}
private void connectToNNAndHandshake() throws IOException {
    // get NN proxy
    bpNamenode = dn.connectToNN(nnAddr);

    // First phase of the handshake with NN - get the namespace
    // info.
    NamespaceInfo nsInfo = retrieveNamespaceInfo();
    
    // Verify that this matches the other NN in this HA pair.
    // This also initializes our block pool in the DN if we are
    // the first NN connection for this BP.
    bpos.verifyAndSetNamespaceInfo(nsInfo);
    
    // Second phase of the handshake with the NN.
    register();
  }
{code}

Here in register() call bpRegistration is assigned. Since 
retrieveNamespaceInfo() is like a infinite loop trying to get the version

{code}
NamespaceInfo retrieveNamespaceInfo() throws IOException {
    NamespaceInfo nsInfo = null;
    while (shouldRun()) {
      try {
        nsInfo = bpNamenode.versionRequest();
        LOG.debug(this + " received versionRequest response: " + nsInfo);
        break;
      } catch(SocketTimeoutException e) {  // namenode is busy
        LOG.warn("Problem connecting to server: " + nnAddr);
      } catch(IOException e ) {  // namenode is not available
        LOG.warn("Problem connecting to server: " + nnAddr);
      }
      
      // try again in a second
      sleepAndLogInterrupts(5000, "requesting version info from NN");
    }
    
    if (nsInfo != null) {
      checkNNVersion(nsInfo);
    } else {
      throw new IOException("DN shut down before block pool connected");
    }
    return nsInfo;
  }
{code}

so bpRegistration is not assigned.

                
> NullPointerException in DN when directoryscanner is trying to report bad 
> blocks
> -------------------------------------------------------------------------------
>
>                 Key: HDFS-3332
>                 URL: https://issues.apache.org/jira/browse/HDFS-3332
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node
>    Affects Versions: 3.0.0
>         Environment: HDFS
>            Reporter: amith
>            Assignee: amith
>             Fix For: 3.0.0
>
>
> There is 1 NN and 1 DN (NN is started with HA conf)
> I corrupted 1 block and found 
> {code}
> 2012-04-27 09:59:01,214 INFO  datanode.DataNode 
> (BPServiceActor.java:blockReport(401)) - BlockReport of 2 blocks took 0 msec 
> to generate and 5 msecs for RPC and NN processing
> 2012-04-27 09:59:01,214 INFO  datanode.DataNode 
> (BPServiceActor.java:blockReport(420)) - sent block report, processed 
> command:org.apache.hadoop.hdfs.server.protocol.FinalizeCommand@3b756db3
> 2012-04-27 09:59:01,726 INFO  datanode.DirectoryScanner 
> (DirectoryScanner.java:scan(390)) - BlockPool 
> BP-2087868617-10.18.40.95-1335500488012 Total blocks: 2, missing metadata 
> files:0, missing block files:0, missing blocks in memory:0, mismatched 
> blocks:1
> 2012-04-27 09:59:01,727 WARN  impl.FsDatasetImpl 
> (FsDatasetImpl.java:checkAndUpdate(1366)) - Updating size of block 
> -4466699320171028643 from 1024 to 1034
> 2012-04-27 09:59:01,727 WARN  impl.FsDatasetImpl 
> (FsDatasetImpl.java:checkAndUpdate(1374)) - Reporting the block 
> blk_-4466699320171028643_1004 as corrupt due to length mismatch
> 2012-04-27 09:59:01,728 DEBUG ipc.Client (Client.java:sendParam(807)) - IPC 
> Client (1957050620) connection to /10.18.40.95:8020 from root sending #257
> 2012-04-27 09:59:01,730 DEBUG ipc.Client (Client.java:receiveResponse(848)) - 
> IPC Client (1957050620) connection to /10.18.40.95:8020 from root got value 
> #257
> 2012-04-27 09:59:01,730 DEBUG ipc.ProtobufRpcEngine 
> (ProtobufRpcEngine.java:invoke(193)) - Call: reportBadBlocks 2
> 2012-04-27 09:59:01,731 ERROR datanode.DirectoryScanner 
> (DirectoryScanner.java:run(288)) - Exception during DirectoryScanner 
> execution - will continue next cycle
> java.lang.NullPointerException
>       at org.apache.hadoop.hdfs.protocol.DatanodeID.<init>(DatanodeID.java:66)
>       at 
> org.apache.hadoop.hdfs.protocol.DatanodeInfo.<init>(DatanodeInfo.java:87)
>       at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.reportBadBlocks(BPServiceActor.java:238)
>       at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.reportBadBlocks(BPOfferService.java:187)
>       at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.reportBadBlocks(DataNode.java:559)
>       at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.checkAndUpdate(FsDatasetImpl.java:1377)
>       at 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:318)
>       at 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:284)
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>       at 
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>       at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>       at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>       at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
>       at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>       at java.lang.Thread.run(Thread.java:619)
> {code}
> Here when Directory scanner is trying to report badblock we got a NPE.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3332) NullPointerException in DN when directoryscanner is trying to report bad blocks

Reply via email to