[
https://issues.apache.org/jira/browse/HDFS-16659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ZanderXu updated HDFS-16659:
----------------------------
Summary: JournalNode should throw CacheMissException if SinceTxId is bigger
than HighestWrittenTxId (was: JournalNode should throw CacheMissException when
SinceTxId is more than HighestWrittenTxId)
> JournalNode should throw CacheMissException if SinceTxId is bigger than
> HighestWrittenTxId
> ------------------------------------------------------------------------------------------
>
> Key: HDFS-16659
> URL: https://issues.apache.org/jira/browse/HDFS-16659
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: ZanderXu
> Assignee: ZanderXu
> Priority: Critical
>
> JournalNode should throw `CacheMissException` if `sinceTxId` is bigger than
> `highestWrittenTxId`. And it will caused EditlogTailer can not able to tail
> edits. And it maybe caused ObserverNameNode can not able handle requests from
> clients.
> Suppose there are 3 journalNodes, JN0 ~ JN1.
> The corner case as blew:
> * JN0 has some abnormal cases when Active Namenode is journaling Edits with
> start txId 11
> * NameNode just ignore the abnormal JN0 and continue to write Edits to
> Journal 1 and 2
> * JN0 backed to health
> * Observer NameNode try to select EditLogInputStream vis PRC with start txId
> 21
> * Journal 1 has some abnormal cases caused slow rpc response
> And the expected selecting result is: Response should contain 20 Edits from
> txId 21 to txId 40 from JN1 and JN2. Because Active NameNode successfully
> write these Edits to JN1 and JN2 and failed write these edits to JN0, so
> there is no Edits from id 21 to 40 in the cache of JN0.
> But in the current implementation, there is no Edits in the Response.
> Because namenode successfully got a response from JN0 that did not contains
> any Edits.
> And the bug code as blew:
> {code:java}
> if (sinceTxId > getHighestWrittenTxId()) {
> // Requested edits that don't exist yet; short-circuit the cache here
> metrics.rpcEmptyResponses.incr();
> return
> GetJournaledEditsResponseProto.newBuilder().setTxnCount(0).build();
> }
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]