[
https://issues.apache.org/jira/browse/HDFS-13125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Erik Krogen resolved HDFS-13125.
--------------------------------
Resolution: Duplicate
This was subsumed by HDFS-13150
> Improve efficiency of JN -> Standby Pipeline Under Frequent Edit Tailing
> ------------------------------------------------------------------------
>
> Key: HDFS-13125
> URL: https://issues.apache.org/jira/browse/HDFS-13125
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: journal-node, namenode
> Reporter: Erik Krogen
> Assignee: Erik Krogen
> Priority: Major
>
> The current edit tailing pipeline is designed for
> * High resiliency
> * High throughput
> and was _not_ designed for low latency.
> It was designed under the assumption that each edit log segment would
> typically be read all at once, e.g. on startup or the SbNN tailing the entire
> thing after it is finalized. The ObserverNode should be reading constantly
> from the JournalNodes' in-progress edit logs with low latency, to reduce the
> lag time from when a transaction is committed on the ANN and when it is
> visible on the ObserverNode.
> Due to the critical nature of this pipeline to the health of HDFS, it would
> be better not to redesign it altogether. Based on some experiments it seems
> if we mitigate the following issues, lag times are reduced to low levels (low
> hundreds of milliseconds even under very high write load):
> * The overhead of creating a new HTTP connection for each time new edits are
> fetched. This makes sense when you're expecting to tail an entire segment; it
> does not when you may only be fetching a small number of edits. We can
> mitigate this by allowing edits to be tailed via an RPC call, or by adding a
> connection pool for the existing connections to the journal.
> * The overhead of transmitting a whole file at once. Right now when an edit
> segment is requested, the JN sends the entire segment, and on the SbNN it
> will ignore edits up to the ones it wants. How to solve this one may be more
> tricky, but one suggestion would be to keep recently logged edits in memory,
> avoiding the need to serve them from file at all, allowing the JN to quickly
> serve only the required edits.
> We can implement these as optimizations on top of the existing logic, with
> fallbacks to the current slow-but-resilient pipeline.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]