[ 
https://issues.apache.org/jira/browse/HDFS-16659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-16659:
----------------------------------
    Labels: pull-request-available  (was: )

> JournalNode should throw CacheMissException if SinceTxId is bigger than 
> HighestWrittenTxId
> ------------------------------------------------------------------------------------------
>
>                 Key: HDFS-16659
>                 URL: https://issues.apache.org/jira/browse/HDFS-16659
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: ZanderXu
>            Assignee: ZanderXu
>            Priority: Critical
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> JournalNode should throw `CacheMissException` if `sinceTxId` is bigger than 
> `highestWrittenTxId`. And it will caused EditlogTailer can not able to tail 
> edits. And it maybe caused ObserverNameNode can not able handle requests from 
> clients.
> Suppose there are 3 journalNodes, JN0 ~ JN1.
> The corner case as blew:
> * JN0 has some abnormal cases when Active Namenode is journaling Edits with 
> start txId 11
> * NameNode just ignore the abnormal JN0 and continue to write Edits to 
> Journal 1 and 2
> * JN0 backed to health
> * Observer NameNode try to select EditLogInputStream vis PRC with start txId 
> 21
> * Journal 1 has some abnormal cases caused slow rpc response
> And the expected selecting result is: Response should contain 20 Edits from 
> txId 21 to txId 40 from JN1 and JN2. Because Active NameNode successfully 
> write these Edits to JN1 and JN2 and failed write these edits to JN0, so 
> there is no Edits from id 21 to 40 in the cache of JN0.
> But in the current implementation,  there is no Edits in the Response. 
> Because namenode successfully got a response from JN0 that did not contains 
> any Edits.
> And the bug code as blew:
> {code:java}
> if (sinceTxId > getHighestWrittenTxId()) {
>     // Requested edits that don't exist yet; short-circuit the cache here
>     metrics.rpcEmptyResponses.incr();
>     return 
> GetJournaledEditsResponseProto.newBuilder().setTxnCount(0).build(); 
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to