[ 
https://issues.apache.org/jira/browse/HDFS-17631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17888834#comment-17888834
 ] 

ASF GitHub Bot commented on HDFS-17631:
---------------------------------------

LiuGuH commented on PR #7066:
URL: https://github.com/apache/hadoop/pull/7066#issuecomment-2408455733

   Thanks for relay. 
   For a editlog file (starttxid-endtxid), RedundantEditLogInputStream combine 
journalnodes EditlogInputSteam. When standby namenode replay editlog,  
RedundantEditLogInputStream.nextOp() will execute. Assuse read from txid 
(starttxid < txid < endtxid).
   
   Now State.SKIP_UNTIL (from starttxid skip to txid) -> State.OK (and  will 
read txid+1).
   And if SKIP_UNTIL throw IOException, it will into State.OK  rather than 
State.STREAM_FAILED with current logic.
   
   (1)If the stream returns to normal ,  read next op  may read a op in 
(starttxid,txid),  FSEditLogLoader will log that  "There appears to be an 
out-of-order edit in the edit log" and discard this op and continue.
   
   (2)If If the stream is  still wrong,  State.OK  will be into 
State.STREAM_FAILED and the swith to another EditlogInputSteam. 
   
   And this PR will make State.SKIP_UNTIL (from starttxid skip to txid) -> 
State.STREAM_FAILED directly if SKIP_UNTIL throw IOException. 
   
   
![image](https://github.com/user-attachments/assets/22f9c033-49ab-4cd0-a954-25a75db46178)
   
   
   
   




> Fix RedundantEditLogInputStream.nextOp()  state error when 
> EditLogInputStream.skipUntil() throw IOException
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-17631
>                 URL: https://issues.apache.org/jira/browse/HDFS-17631
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: liuguanghua
>            Assignee: liuguanghua
>            Priority: Major
>              Labels: pull-request-available
>
> For namenode HA mode, standby namenode load editlog form journalnodes  via 
> QuorumJournalManger.selectInputStreams().  And RedundantEditLogInputStream is 
> used for combine multiple remote journalnode inputstreams.
> The problems is that when read editlog with 
> RedundantEditLogInputStream.nextOp() if the first stream execute skipUntil() 
> throw IOException ( network errors, or hardware problems etc..) ,  it will be 
> State.OK rather than State.STREAM_FAILED. 
> And the proper state will be like blew and fault tolerant:
> State.SKIP_UNTIL -> State.STREAM_FAILED ->(try next stream)  State.SKIP_UNTIL 
> -> State.OK



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to