[ 
https://issues.apache.org/jira/browse/KUDU-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16142344#comment-16142344
 ] 

Todd Lipcon commented on KUDU-2014:
-----------------------------------

Another thing I noticed on a cluster yesterday is that the tserver actually got 
CPU-bound in inserting block records into the block map. This is effectively 
single-threaded since we hold a lock around the block map. Sharding the data 
structure could probably be a nice boost in the case that it's not IO-bound.

> Explore additional approaches to improve LBM startup time
> ---------------------------------------------------------
>
>                 Key: KUDU-2014
>                 URL: https://issues.apache.org/jira/browse/KUDU-2014
>             Project: Kudu
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 1.4.0
>            Reporter: Adar Dembo
>              Labels: data-scalability
>
> The fix for KUDU-1549 added support for deleting full log block manager 
> containers with no live blocks, and for compacting container metadata to omit 
> CREATE/DELETE record pairs. Both of these will help reduce the amount of 
> metadata that must be read at startup. However, there's more we can do to 
> help; this JIRA captures some additional ideas worth exploring (if/when LBM 
> startup once again becomes intolerable):
> In [this 
> gerrit|https://gerrit.cloudera.org/#/c/6826/2/src/kudu/fs/log_block_manager.cc@90],
>  Todd made the case that container metadata processing is seek-dominant:
> {quote}
> looking at a data/ dir on a cluster that has been around for quite some time, 
> most of the metadata files seem to be around 400KB. Assuming 100MB/sec 
> sequential throughput and 10ms seek, it definitely seems like the startup 
> time would be seek-dominated (10 or 20ms seek depending whether various 
> internal metadata pages are hot in cache, plus only 4ms of sequential read 
> time). 
> {quote}
> We theorized several ways to reduce seeking, all focused on reducing the 
> number of discrete container metadata files read at startup:
> # Raise the container max data file size. This won't help on older versions 
> of el6 with ext4, but will help everywhere else. It makes sense for the max 
> data file size to be a function of the disk size anyway. And it's a pretty 
> cheap way to extract more scalability.
> # Reuse container data file holes, explicitly to avoid creating so many 
> containers. Perhaps with a round of "defragmentation" to simplify reuse, or 
> perhaps not. As a side effect, metadata file compaction now becomes more 
> important (and costly).
> # Eschew one metadata file per data file altogether and maintain just one 
> metadata file. Deleting "dead" containers would no longer be an improvement 
> for metadata startup cost. Metadata compaction would be a lot more expensive. 
> Block records themselves would be larger, because each record now needs to 
> point to a particular data file, though this can be mitigated in various 
> ways. A variant of this would be to do away with the 1-1 relationship between 
> metadata and data files and make it more like m-n.
> # Reduce the number of extents in container metadata files via judicious 
> preallocation.
> See the gerrit linked above for more details.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to