[ 
https://issues.apache.org/jira/browse/HDFS-15113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17059063#comment-17059063
 ] 

Wei-Chiu Chuang commented on HDFS-15113:
----------------------------------------

Thanks [~hexiaoqiao] for the nicely written test!

{code}
      DataNodeFaultInjector.set(new DataNodeFaultInjector() {
        public void blockUtilSendFullBlockReport() {
          try {
            Thread.sleep(200);
          } catch (InterruptedException e) {
            e.printStackTrace();
          }
        }
      });
{code}
Pausing the thread for a short duration and expect the invariants to hold true, 
is likely going to generate lots of flaky failures.

{code}
      // Make sure that generate blocks for DataNode and IBR not empty now.
      Thread.sleep(200);
{code}
Ideally you should use something like a Semaphore to ensure this.


{code}
addNewBlockThread.start();
{code}
As a good habit, join() the thread at the end of test.

{code}
Mockito.doAnswer((Answer<Object>) invocation -> {
      Object[] arguments = invocation.getArguments();
      StorageReceivedDeletedBlocks[] list =
          (StorageReceivedDeletedBlocks[])arguments[2];
      setIncreaseBlockReportCount(list[0].getBlocks().length);
      return null;
    }).when(mockNN).blockReceivedAndDeleted(
{code}
Here we assume the method is always used to add blocks and will not be used to 
test deleting blocks. This can be confusing for future test writers. Suggest to 
add a comment.

On the condition that we will address these in the future I am +1. Don't want 
to hold off a release because of my picky comments in the tests. Will commit 
later.

> Missing IBR when NameNode restart if open processCommand async feature
> ----------------------------------------------------------------------
>
>                 Key: HDFS-15113
>                 URL: https://issues.apache.org/jira/browse/HDFS-15113
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>            Reporter: Xiaoqiao He
>            Assignee: Xiaoqiao He
>            Priority: Blocker
>         Attachments: HDFS-15113.001.patch, HDFS-15113.002.patch, 
> HDFS-15113.003.patch, HDFS-15113.004.patch, HDFS-15113.005.patch
>
>
> Recently, I meet one case that NameNode missing block after restart which is 
> related with HDFS-14997.
> a. during NameNode restart, it will return command `DNA_REGISTER` to DataNode 
> when receive some RPC request from DataNode.
> b. when DataNode receive `DNA_REGISTER` command, it will run #reRegister 
> async.
> {code:java}
>   void reRegister() throws IOException {
>     if (shouldRun()) {
>       // re-retrieve namespace info to make sure that, if the NN
>       // was restarted, we still match its version (HDFS-2120)
>       NamespaceInfo nsInfo = retrieveNamespaceInfo();
>       // and re-register
>       register(nsInfo);
>       scheduler.scheduleHeartbeat();
>       // HDFS-9917,Standby NN IBR can be very huge if standby namenode is down
>       // for sometime.
>       if (state == HAServiceState.STANDBY || state == 
> HAServiceState.OBSERVER) {
>         ibrManager.clearIBRs();
>       }
>     }
>   }
> {code}
> c. As we know, #register will trigger BR immediately.
> d. because #reRegister run async, so we could not make sure which one run 
> first between send FBR and clear IBR. If clean IBR run first, it will be OK. 
> But if send FBR first then clear IBR, it will missing some blocks received 
> between these two time point until next FBR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to