[
https://issues.apache.org/jira/browse/HDFS-16557?focusedWorklogId=764587&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-764587
]
ASF GitHub Bot logged work on HDFS-16557:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 30/Apr/22 02:12
Start Date: 30/Apr/22 02:12
Worklog Time Spent: 10m
Work Description: tomscut commented on PR #4219:
URL: https://github.com/apache/hadoop/pull/4219#issuecomment-1113892494
> This seems right to me, but I don't fully understand what went wrong to
cause the error. Can you explain more fully? Why did we previously make the
assumption that `INVALID_TXID` meant in-progress, and what has changed to make
that not true / what happened in your specific scenario to cause that not to be
true?
Thank you @xkrogen very much for your review.
After introducing [SBN READ], we updated the configuration:
`dfs.ha.tail-edits.in-progress=true`.
Then when we `bootstrapStandby`, we will encounter something like this:
1. We need to start an Observer Namenode, so we execute bootstrapStandby
before start it. This will automatically pull the latest FSImage from the
Active Namenode and check whether the edits in the journals has a gap based on
the `lastTxid` of the FSImage.
2. Assume that the txid of the latest FSImage is x, and editslogs from x in
journals is in `InProgress` state, `FSEditLog#checkForGaps` will be skipped.
Because the `lastTxid` of the InProgress EditLogInputStream is not
`HdfsServerConstants.INVALID_TXID`, but a specific number.
3. However, between x and txID currently being written, there is finalize
Edit log, and `bootstrapStandby` can execute normally.
The `lastTxId` of an InProgress EditLogInputStream isn't always as
`HdfsServerConstants.INVALID_TXID`, could also be a specific number.
Issue Time Tracking
-------------------
Worklog Id: (was: 764587)
Time Spent: 1h 10m (was: 1h)
> BootstrapStandby failed because of checking gap for inprogress
> EditLogInputStream
> ---------------------------------------------------------------------------------
>
> Key: HDFS-16557
> URL: https://issues.apache.org/jira/browse/HDFS-16557
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Tao Li
> Assignee: Tao Li
> Priority: Major
> Labels: pull-request-available
> Attachments: image-2022-04-22-17-17-14-577.png,
> image-2022-04-22-17-17-14-618.png, image-2022-04-22-17-17-23-113.png,
> image-2022-04-22-17-17-32-487.png
>
> Time Spent: 1h 10m
> Remaining Estimate: 0h
>
> The lastTxId of an inprogress EditLogInputStream lastTxId isn't necessarily
> HdfsServerConstants.INVALID_TXID. We can determine its status directly by
> EditLogInputStream#isInProgress.
> We introduced [SBN READ], and set
> {color:#ff0000}{{dfs.ha.tail-edits.in-progress=true}}{color}. Then
> bootstrapStandby, the EditLogInputStream of inProgress is misjudged,
> resulting in a gap check failure, which causes bootstrapStandby to fail.
> hdfs namenode -bootstrapStandby
> !image-2022-04-22-17-17-32-487.png|width=766,height=161!
> !image-2022-04-22-17-17-14-577.png|width=598,height=187!
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]