[
https://issues.apache.org/jira/browse/KUDU-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kurt Deschler reassigned KUDU-2014:
-----------------------------------
Assignee: Ashwani Raina
> Explore additional approaches to improve LBM startup time
> ---------------------------------------------------------
>
> Key: KUDU-2014
> URL: https://issues.apache.org/jira/browse/KUDU-2014
> Project: Kudu
> Issue Type: Improvement
> Components: fs
> Affects Versions: 1.4.0
> Reporter: Adar Dembo
> Assignee: Ashwani Raina
> Priority: Major
> Labels: data-scalability, roadmap-candidate
>
> The fix for KUDU-1549 added support for deleting full log block manager
> containers with no live blocks, and for compacting container metadata to omit
> CREATE/DELETE record pairs. Both of these will help reduce the amount of
> metadata that must be read at startup. However, there's more we can do to
> help; this JIRA captures some additional ideas worth exploring (if/when LBM
> startup once again becomes intolerable):
> In [this
> gerrit|https://gerrit.cloudera.org/#/c/6826/2/src/kudu/fs/log_block_manager.cc@90],
> Todd made the case that container metadata processing is seek-dominant:
> {quote}
> looking at a data/ dir on a cluster that has been around for quite some time,
> most of the metadata files seem to be around 400KB. Assuming 100MB/sec
> sequential throughput and 10ms seek, it definitely seems like the startup
> time would be seek-dominated (10 or 20ms seek depending whether various
> internal metadata pages are hot in cache, plus only 4ms of sequential read
> time).
> {quote}
> We theorized several ways to reduce seeking, all focused on reducing the
> number of discrete container metadata files read at startup:
> # Raise the container max data file size. This won't help on older versions
> of el6 with ext4, but will help everywhere else. It makes sense for the max
> data file size to be a function of the disk size anyway. And it's a pretty
> cheap way to extract more scalability.
> # Reuse container data file holes, explicitly to avoid creating so many
> containers. Perhaps with a round of "defragmentation" to simplify reuse, or
> perhaps not. As a side effect, metadata file compaction now becomes more
> important (and costly).
> # Eschew one metadata file per data file altogether and maintain just one
> metadata file. Deleting "dead" containers would no longer be an improvement
> for metadata startup cost. Metadata compaction would be a lot more expensive.
> Block records themselves would be larger, because each record now needs to
> point to a particular data file, though this can be mitigated in various
> ways. A variant of this would be to do away with the 1-1 relationship between
> metadata and data files and make it more like m-n.
> # Reduce the number of extents in container metadata files via judicious
> preallocation.
> See the gerrit linked above for more details.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)