[ 
https://issues.apache.org/jira/browse/HDFS-17156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17755265#comment-17755265
 ] 

ASF GitHub Bot commented on HDFS-17156:
---------------------------------------

simbadzina commented on PR #5951:
URL: https://github.com/apache/hadoop/pull/5951#issuecomment-1681305358

   I've created a unit test here: 
https://github.com/simbadzina/hadoop/pull/1/commits/d216bfe8c5f506f08263cf490479a0398b097f05
   
   ```
     /**
      * Verify that stateID is received into call before
      * call is notified.
      * @throws IOException
      */
     @Test(timeout=60000)
     public void testReceiveStateBeforeCallerNotification() throws IOException {
       AtomicBoolean stateReceived = new AtomicBoolean(false);
       AlignmentContext alignmentContext = Mockito.mock(AlignmentContext.class);
       Mockito.doAnswer((Answer<Void>) invocation -> {
         Thread.sleep(1000);
         stateReceived.set(true);
         return null;
       }).when(alignmentContext)
           
.receiveResponseState(any(RpcHeaderProtos.RpcResponseHeaderProto.class));
   
       final Client client = Mockito.spy(new Client(LongWritable.class, conf));
       final TestServer server = new TestServer(1, false);
   
       try {
         InetSocketAddress addr = NetUtils.getConnectAddress(server);
         server.start();
         call(client, new LongWritable(RANDOM.nextLong()), addr,
             0, conf, alignmentContext);
         Assert.assertTrue(stateReceived.get());
       } finally {
         client.stop();
         server.stop();
       }
     }
   ```




> mapreduce job encounters java.io.IOException
> --------------------------------------------
>
>                 Key: HDFS-17156
>                 URL: https://issues.apache.org/jira/browse/HDFS-17156
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: rbf
>            Reporter: Chunyi Yang
>            Assignee: Chunyi Yang
>            Priority: Minor
>              Labels: Observer, RBF, pull-request-available
>
> While executing a mapreduce job in an environment utilizing Router-Based 
> Federation with Observer read enabled, there is an estimated 1% chance of 
> encountering the following error.
> {code:java}
> "java.io.IOException: Resource 
> hdfs://XXXX/user/XXXX/.staging/job_XXXXXX/.tez/application_XXXXXX/tez-conf.pb 
> changed on src filesystem - expected: \"2023-07-07T12:41:16.801+0900\", was: 
> \"2023-07-07T12:41:16.822+0900\", current time: 
> \"2023-07-07T12:41:22.386+0900\"",
> {code}
> This error happens in function verifyAndCopy inside FSDownload.java when 
> nodemanager tries to download a file right after the file has been written to 
> the HDFS. The write operation runs on active namenode and read operation runs 
> on observer namenode as expected.
> The edits file and hdfs-audit files indicate that the expected timestamp 
> mentioned in the error message aligns with the OP_CLOSE MTIME of the 
> 'tez-conf.pb' file (which is correct). However, the actual timestamp 
> retrieved from the read operation corresponds to the OP_ADD MTIME of the 
> target 'tez-conf.pf' file (which is incorrect). This inconsistency suggests 
> that the observer namenode responds to the client before its edits file is 
> updated with the latest stateId.
> Further troubleshooting has revealed that during write operations, the router 
> responds to the client before receiving the latest stateId from the active 
> namenode. Consequently, the outdated stateId is then used in the subsequent 
> read operation on the observer namenode, leading to inaccuracies in the 
> information provided by the observer namenode.
> To resolve this issue, it is essential to ensure that the router sends a 
> response to the client only after receiving the latest stateId from the 
> active namenode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to