Erik Krogen created HDFS-13125:
----------------------------------
Summary: Improve efficiency of JN -> Standby Pipeline Under
Frequent Edit Tailing
Key: HDFS-13125
URL: https://issues.apache.org/jira/browse/HDFS-13125
Project: Hadoop HDFS
Issue Type: Improvement
Components: journal-node, namenode
Reporter: Erik Krogen
Assignee: Erik Krogen
The current edit tailing pipeline is designed for
* High resiliency
* High throughput
and was _not_ designed for low latency.
It was designed under the assumption that each edit log segment would typically
be read all at once, e.g. on startup or the SbNN tailing the entire thing after
it is finalized. The ObserverNode should be reading constantly from the
JournalNodes' in-progress edit logs with low latency, to reduce the lag time
from when a transaction is committed on the ANN and when it is visible on the
ObserverNode.
Due to the critical nature of this pipeline to the health of HDFS, it would be
better not to redesign it altogether. Based on some experiments it seems if we
mitigate the following issues, lag times are reduced to low levels (low
hundreds of milliseconds even under very high write load):
* The overhead of creating a new HTTP connection for each time new edits are
fetched. This makes sense when you're expecting to tail an entire segment; it
does not when you may only be fetching a small number of edits. We can mitigate
this by allowing edits to be tailed via an RPC call, or by adding a connection
pool for the existing connections to the journal.
* The overhead of transmitting a whole file at once. Right now when an edit
segment is requested, the JN sends the entire segment, and on the SbNN it will
ignore edits up to the ones it wants. How to solve this one may be more tricky,
but one suggestion would be to keep recently logged edits in memory, avoiding
the need to serve them from file at all, allowing the JN to quickly serve only
the required edits.
We can implement these as optimizations on top of the existing logic, with
fallbacks to the current slow-but-resilient pipeline.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]