[ https://issues.apache.org/jira/browse/HADOOP-18847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chunyi Yang updated HADOOP-18847: --------------------------------- Description: While executing a mapreduce job in an environment utilizing Router-Based Federation with Observer read enabled, there is an estimated 1% chance of encountering the following error. {code:java} "java.io.IOException: Resource hdfs://XXXX/user/XXXX/.staging/job_XXXXXX/.tez/application_XXXXXX/tez-conf.pb changed on src filesystem - expected: \"2023-07-07T12:41:16.801+0900\", was: \"2023-07-07T12:41:16.822+0900\", current time: \"2023-07-07T12:41:22.386+0900\"", {code} This error happens in function verifyAndCopy inside FSDownload.java when nodemanager tries to download a file right after the file has been written to the HDFS. The write operation runs on active namenode and read operation runs on observer namenode as expected. The edits file and hdfs-audit files show that the expected time seen in error message is the OP_CLOSE MTIME of the tez-conf.pb file(which is correct) while the actual timestamp it gets from the read operation is OP_ADD MTIME of the target tez-conf.pf file (which is wrong). This mismatch shows that the observer namenode responses to client before its edits file updates to the lastest stateid which causes the issue. Further troubleshooting has revealed that during write operations, the router responds to the client before receiving the latest stateId from the active namenode. Consequently, the outdated stateId is then used in the subsequent read operation on the observer namenode, leading to inaccuracies in the information provided by the observer namenode. To resolve this issue, it is essential to ensure that the router sends a response to the client only after receiving the latest stateId from the active namenode. was: When running mapreduce job in a Router-Based Federation+Observer read enabled environment, we see approximately a 1% probability of encountering the following error. {code:java} "java.io.IOException: Resource hdfs://XXXX/user/XXXX/.staging/job_XXXXXX/.tez/application_XXXXXX/tez-conf.pb changed on src filesystem - expected: \"2023-07-07T12:41:16.801+0900\", was: \"2023-07-07T12:41:16.822+0900\", current time: \"2023-07-07T12:41:22.386+0900\"", {code} This error happens in function verifyAndCopy inside FSDownload.java when nodemanager tries to download a file right after the file has been written to the HDFS. The write operation runs on active namenode and read operation runs on observer namenode as expected. The edits file and hdfs-audit files show that the expected time seen in error message is the OP_CLOSE MTIME of the tez-conf.pb file(which is correct) while the actual timestamp it gets from the read operation is OP_ADD MTIME of the target tez-conf.pf file (which is wrong). This mismatch shows that the observer namenode responses to client before its edits file updates to the lastest stateid which causes the issue. > mapreduce job encounters java.io.IOException when > dfs.client.rbf.observer.read.enable is true > --------------------------------------------------------------------------------------------- > > Key: HADOOP-18847 > URL: https://issues.apache.org/jira/browse/HADOOP-18847 > Project: Hadoop Common > Issue Type: Bug > Components: common > Reporter: Chunyi Yang > Priority: Minor > > While executing a mapreduce job in an environment utilizing Router-Based > Federation with Observer read enabled, there is an estimated 1% chance of > encountering the following error. > {code:java} > "java.io.IOException: Resource > hdfs://XXXX/user/XXXX/.staging/job_XXXXXX/.tez/application_XXXXXX/tez-conf.pb > changed on src filesystem - expected: \"2023-07-07T12:41:16.801+0900\", was: > \"2023-07-07T12:41:16.822+0900\", current time: > \"2023-07-07T12:41:22.386+0900\"", > {code} > This error happens in function verifyAndCopy inside FSDownload.java when > nodemanager tries to download a file right after the file has been written to > the HDFS. The write operation runs on active namenode and read operation runs > on observer namenode as expected. > The edits file and hdfs-audit files show that the expected time seen in error > message is the OP_CLOSE MTIME of the tez-conf.pb file(which is correct) while > the actual timestamp it gets from the read operation is OP_ADD MTIME of the > target tez-conf.pf file (which is wrong). This mismatch shows that the > observer namenode responses to client before its edits file updates to the > lastest stateid which causes the issue. > Further troubleshooting has revealed that during write operations, the router > responds to the client before receiving the latest stateId from the active > namenode. Consequently, the outdated stateId is then used in the subsequent > read operation on the observer namenode, leading to inaccuracies in the > information provided by the observer namenode. > To resolve this issue, it is essential to ensure that the router sends a > response to the client only after receiving the latest stateId from the > active namenode. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org