shuaiqig commented on PR #6236: URL: https://github.com/apache/hadoop/pull/6236#issuecomment-1801240080
> > @shuaiqig Thanks for your report. But I am confused why upload fsimage from standby could hold write lock at active NameNode side? any stack do you print? Thanks again. BTW, update description which copy from JIRA. > > Thanks @shuaiqig for your report. And Thanks @Hexiaoqiao for your comments. > > There are some real problems here. When SNN does `checkpoint`, it uploads a `fsimage`, which may be tens of gigabytes, which will make the disk where the ANN metadata is stored very busy. > > When `rollEditLog()` is called, ANN writes to `seen_txid` in both the `dfs.namenode.name.dir` and the `dfs.namenode.edits.dir` (regardless of whether they are isolated or not), using a` write lock` . If the ioutil is high, it will take a long time to write the small file `seen_txid`, so indirectly cause ANN to hold the write lock for a long time. > > we added a separate lock to achieve mutual exclusion. Thanks for your answer@tomscut. The reason for this problem has puzzled me for a long time, so I did a lot of debugging and couldn't find it. And you mentioned "we added a separate lock to achieve mutual exclusion", could you tell me where I can find this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
