tomscut commented on PR #4219: URL: https://github.com/apache/hadoop/pull/4219#issuecomment-1113892494
> This seems right to me, but I don't fully understand what went wrong to cause the error. Can you explain more fully? Why did we previously make the assumption that `INVALID_TXID` meant in-progress, and what has changed to make that not true / what happened in your specific scenario to cause that not to be true? Thank you @xkrogen very much for your review. After introducing [SBN READ], we updated the configuration: `dfs.ha.tail-edits.in-progress=true`. Then when we `bootstrapStandby`, we will encounter something like this: 1. We need to start an Observer Namenode, so we execute bootstrapStandby before start it. This will automatically pull the latest FSImage from the Active Namenode and check whether the edits in the journals has a gap based on the `lastTxid` of the FSImage. 2. Assume that the txid of the latest FSImage is x, and editslogs from x in journals is in `InProgress` state, `FSEditLog#checkForGaps` will be skipped. Because the `lastTxid` of the InProgress EditLogInputStream is not `HdfsServerConstants.INVALID_TXID`, but a specific number. 3. However, between x and txID currently being written, there is finalize Edit log, and `bootstrapStandby` can execute normally. The `lastTxId` of an InProgress EditLogInputStream isn't always as `HdfsServerConstants.INVALID_TXID`, could also be a specific number. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
