[ 
https://issues.apache.org/jira/browse/HDFS-16659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZanderXu updated HDFS-16659:
----------------------------
    Description: 
### Description of PR
JournalNode should throw `CacheMissException` if `sinceTxId` is bigger than 
`highestWrittenTxId` during handling `getJournaledEdits` rpc from NNs. 
Current logic may cause in-progress EditlogTailer cannot replay any Edits from 
JournalNodes in some corner cases, resulting in ObserverNameNode cannot handle 
requests from clients.

Suppose there are 3 journalNodes, JN0 ~ JN1.
* JN0 has some abnormal cases when Active Namenode is syncing 10 Edits with 
first txid 11
* NameNode just ignore the abnormal JN0 and continue to sync Edits to Journal 1 
and 2
* JN0 backed to health
* NameNode continue sync 10 Edits with first txid 21.
* At this point, there are no Edits 11 ~ 30 in the cache of JN0
* Observer NameNode try to select EditLogInputStream through 
`getJournaledEdits` with since txId 21
* Journal 2 has some abnormal cases and caused a slow response

The expected result is: Response should contain 20 Edits from txId 21 to txId 
30 from JN1 and JN2. Because Active NameNode successfully write these Edits to 
JN1 and JN2 and failed write these edits to JN0.

But in the current implementation,  the response is [Response(0) from JN0, 
Response(10) from JN1], because  there are some abnormal cases in  JN2, such as 
GC, bad network,  cause a slow response. So the `maxAllowedTxns` will be 0, 
NameNode will not replay any Edits.


As above, the root case is that JournalNode should throw Miss Cache Exception 
when `sinceTxid` is more than `highestWrittenTxId`.

And the bug code as blew:
{code:java}
if (sinceTxId > getHighestWrittenTxId()) {
    // Requested edits that don't exist yet; short-circuit the cache here
    metrics.rpcEmptyResponses.incr();
    return GetJournaledEditsResponseProto.newBuilder().setTxnCount(0).build(); 
}
{code}


  was:
JournalNode should throw `CacheMissException` if `sinceTxId` is bigger than 
`highestWrittenTxId`. And it will caused EditlogTailer can not able to tail 
edits. And it maybe caused ObserverNameNode can not able handle requests from 
clients.

Suppose there are 3 journalNodes, JN0 ~ JN1.
The corner case as blew:
* JN0 has some abnormal cases when Active Namenode is journaling Edits with 
start txId 11
* NameNode just ignore the abnormal JN0 and continue to write Edits to Journal 
1 and 2
* JN0 backed to health
* Observer NameNode try to select EditLogInputStream vis PRC with start txId 21
* Journal 1 has some abnormal cases caused slow rpc response

And the expected selecting result is: Response should contain 20 Edits from 
txId 21 to txId 40 from JN1 and JN2. Because Active NameNode successfully write 
these Edits to JN1 and JN2 and failed write these edits to JN0, so there is no 
Edits from id 21 to 40 in the cache of JN0.

But in the current implementation,  there is no Edits in the Response. Because 
namenode successfully got a response from JN0 that did not contains any Edits.
And the bug code as blew:

{code:java}
if (sinceTxId > getHighestWrittenTxId()) {
    // Requested edits that don't exist yet; short-circuit the cache here
    metrics.rpcEmptyResponses.incr();
    return GetJournaledEditsResponseProto.newBuilder().setTxnCount(0).build(); 
}
{code}



> JournalNode should throw CacheMissException if SinceTxId is bigger than 
> HighestWrittenTxId
> ------------------------------------------------------------------------------------------
>
>                 Key: HDFS-16659
>                 URL: https://issues.apache.org/jira/browse/HDFS-16659
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: ZanderXu
>            Assignee: ZanderXu
>            Priority: Critical
>              Labels: pull-request-available
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> ### Description of PR
> JournalNode should throw `CacheMissException` if `sinceTxId` is bigger than 
> `highestWrittenTxId` during handling `getJournaledEdits` rpc from NNs. 
> Current logic may cause in-progress EditlogTailer cannot replay any Edits 
> from JournalNodes in some corner cases, resulting in ObserverNameNode cannot 
> handle requests from clients.
> Suppose there are 3 journalNodes, JN0 ~ JN1.
> * JN0 has some abnormal cases when Active Namenode is syncing 10 Edits with 
> first txid 11
> * NameNode just ignore the abnormal JN0 and continue to sync Edits to Journal 
> 1 and 2
> * JN0 backed to health
> * NameNode continue sync 10 Edits with first txid 21.
> * At this point, there are no Edits 11 ~ 30 in the cache of JN0
> * Observer NameNode try to select EditLogInputStream through 
> `getJournaledEdits` with since txId 21
> * Journal 2 has some abnormal cases and caused a slow response
> The expected result is: Response should contain 20 Edits from txId 21 to txId 
> 30 from JN1 and JN2. Because Active NameNode successfully write these Edits 
> to JN1 and JN2 and failed write these edits to JN0.
> But in the current implementation,  the response is [Response(0) from JN0, 
> Response(10) from JN1], because  there are some abnormal cases in  JN2, such 
> as GC, bad network,  cause a slow response. So the `maxAllowedTxns` will be 
> 0, NameNode will not replay any Edits.
> As above, the root case is that JournalNode should throw Miss Cache Exception 
> when `sinceTxid` is more than `highestWrittenTxId`.
> And the bug code as blew:
> {code:java}
> if (sinceTxId > getHighestWrittenTxId()) {
>     // Requested edits that don't exist yet; short-circuit the cache here
>     metrics.rpcEmptyResponses.incr();
>     return 
> GetJournaledEditsResponseProto.newBuilder().setTxnCount(0).build(); 
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to