ZanderXu opened a new pull request, #4560:
URL: https://github.com/apache/hadoop/pull/4560
### Description of PR
JournalNode should throw `CacheMissException` if `sinceTxId` is bigger than
`highestWrittenTxId`. And it will caused EditlogTailer can not able to tail
edits. And it maybe caused ObserverNameNode can not able handle requests from
clients.
Suppose there are 3 journalNodes, JN0 ~ JN1.
The corner case as blew:
* JN0 has some abnormal cases when Active Namenode is journaling Edits with
start txId 11
* NameNode just ignore the abnormal JN0 and continue to write Edits to
Journal 1 and 2
* JN0 backed to health
* Observer NameNode try to select EditLogInputStream vis PRC with start txId
21
* Journal 1 has some abnormal cases caused slow rpc response
And the expected selecting result is: Response should contain 20 Edits from
txId 21 to txId 40 from JN1 and JN2. Because Active NameNode successfully write
these Edits to JN1 and JN2 and failed write these edits to JN0, so there is no
Edits from id 21 to 40 in the cache of JN0.
But in the current implementation, there is no Edits in the Response.
Because namenode successfully got a response from JN0 that did not contains any
Edits.
And the bug code as blew:
```
if (sinceTxId > getHighestWrittenTxId()) {
// Requested edits that don't exist yet; short-circuit the cache here
metrics.rpcEmptyResponses.incr();
return
GetJournaledEditsResponseProto.newBuilder().setTxnCount(0).build();
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]