[
https://issues.apache.org/jira/browse/HBASE-24445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17119769#comment-17119769
]
Andrew Kyle Purtell commented on HBASE-24445:
---------------------------------------------
HBASE-24436 proposes an alternative, waiting for a resolution there.
> Improve default thread pool size for opening store files
> --------------------------------------------------------
>
> Key: HBASE-24445
> URL: https://issues.apache.org/jira/browse/HBASE-24445
> Project: HBase
> Issue Type: Improvement
> Reporter: Andrew Kyle Purtell
> Priority: Major
>
> For each store open we create a CompletionService and also create a thread
> pool for opening and closing store files. See HStore#openStoreFiles and
> HRegion#getStoreFileOpenAndCloseThreadPool. By default this pool has only one
> thread. It can be increased with "hbase.hstore.open.and.close.threads.max"
> but this config value is then divided by number of stores in the region.
> "hbase.hstore.open.and.close.threads.max" is also used to size other thread
> pools for opening and closing the stores themselves, so it's an unfortunate
> overloading.
> We should have a configuration parameter that directly and simply tunes the
> thread pool size for opening store files. Introduce a new configuration
> parameter: "hbase.hstore.hfile.open.threads.max" which will define the upper
> bound for a thread pool shared by the entire store for opening hfiles. The
> default should be 1 to preserve default behavior.
> Once this is done, we could increase this to 2, 4, 8, or more for increased
> parallelism when opening store files without impact on other activities. The
> time required to open all storefiles often dominates the total time for
> bringing a region online. The thread pool will be shut down and eligible for
> garbage collection once all files are loaded and the store is online.
> Number of open threads should scale with the number of stores, so allocating
> the pool at the store level continues to make sense.
> Longer term we might try recursively decomposing the region open task with a
> fork-join pool such that the opening of store files can be dynamically
> parallelized in a probably superior way (conjecture pending a real attempt
> with metrics) .
--
This message was sent by Atlassian Jira
(v8.3.4#803005)