[ https://issues.apache.org/jira/browse/HDFS-13125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Erik Krogen resolved HDFS-13125. -------------------------------- Resolution: Duplicate This was subsumed by HDFS-13150 > Improve efficiency of JN -> Standby Pipeline Under Frequent Edit Tailing > ------------------------------------------------------------------------ > > Key: HDFS-13125 > URL: https://issues.apache.org/jira/browse/HDFS-13125 > Project: Hadoop HDFS > Issue Type: Improvement > Components: journal-node, namenode > Reporter: Erik Krogen > Assignee: Erik Krogen > Priority: Major > > The current edit tailing pipeline is designed for > * High resiliency > * High throughput > and was _not_ designed for low latency. > It was designed under the assumption that each edit log segment would > typically be read all at once, e.g. on startup or the SbNN tailing the entire > thing after it is finalized. The ObserverNode should be reading constantly > from the JournalNodes' in-progress edit logs with low latency, to reduce the > lag time from when a transaction is committed on the ANN and when it is > visible on the ObserverNode. > Due to the critical nature of this pipeline to the health of HDFS, it would > be better not to redesign it altogether. Based on some experiments it seems > if we mitigate the following issues, lag times are reduced to low levels (low > hundreds of milliseconds even under very high write load): > * The overhead of creating a new HTTP connection for each time new edits are > fetched. This makes sense when you're expecting to tail an entire segment; it > does not when you may only be fetching a small number of edits. We can > mitigate this by allowing edits to be tailed via an RPC call, or by adding a > connection pool for the existing connections to the journal. > * The overhead of transmitting a whole file at once. Right now when an edit > segment is requested, the JN sends the entire segment, and on the SbNN it > will ignore edits up to the ones it wants. How to solve this one may be more > tricky, but one suggestion would be to keep recently logged edits in memory, > avoiding the need to serve them from file at all, allowing the JN to quickly > serve only the required edits. > We can implement these as optimizations on top of the existing logic, with > fallbacks to the current slow-but-resilient pipeline. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org