[ https://issues.apache.org/jira/browse/HDFS-17631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17888834#comment-17888834 ]
ASF GitHub Bot commented on HDFS-17631: --------------------------------------- LiuGuH commented on PR #7066: URL: https://github.com/apache/hadoop/pull/7066#issuecomment-2408455733 Thanks for relay. For a editlog file (starttxid-endtxid), RedundantEditLogInputStream combine journalnodes EditlogInputSteam. When standby namenode replay editlog, RedundantEditLogInputStream.nextOp() will execute. Assuse read from txid (starttxid < txid < endtxid). Now State.SKIP_UNTIL (from starttxid skip to txid) -> State.OK (and will read txid+1). And if SKIP_UNTIL throw IOException, it will into State.OK rather than State.STREAM_FAILED with current logic. (1)If the stream returns to normal , read next op may read a op in (starttxid,txid), FSEditLogLoader will log that "There appears to be an out-of-order edit in the edit log" and discard this op and continue. (2)If If the stream is still wrong, State.OK will be into State.STREAM_FAILED and the swith to another EditlogInputSteam. And this PR will make State.SKIP_UNTIL (from starttxid skip to txid) -> State.STREAM_FAILED directly if SKIP_UNTIL throw IOException.  > Fix RedundantEditLogInputStream.nextOp() state error when > EditLogInputStream.skipUntil() throw IOException > ----------------------------------------------------------------------------------------------------------- > > Key: HDFS-17631 > URL: https://issues.apache.org/jira/browse/HDFS-17631 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: liuguanghua > Assignee: liuguanghua > Priority: Major > Labels: pull-request-available > > For namenode HA mode, standby namenode load editlog form journalnodes via > QuorumJournalManger.selectInputStreams(). And RedundantEditLogInputStream is > used for combine multiple remote journalnode inputstreams. > The problems is that when read editlog with > RedundantEditLogInputStream.nextOp() if the first stream execute skipUntil() > throw IOException ( network errors, or hardware problems etc..) , it will be > State.OK rather than State.STREAM_FAILED. > And the proper state will be like blew and fault tolerant: > State.SKIP_UNTIL -> State.STREAM_FAILED ->(try next stream) State.SKIP_UNTIL > -> State.OK -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org