[ 
https://issues.apache.org/jira/browse/HBASE-24445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17119769#comment-17119769
 ] 

Andrew Kyle Purtell commented on HBASE-24445:
---------------------------------------------

HBASE-24436 proposes an alternative, waiting for a resolution there.

> Improve default thread pool size for opening store files
> --------------------------------------------------------
>
>                 Key: HBASE-24445
>                 URL: https://issues.apache.org/jira/browse/HBASE-24445
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Andrew Kyle Purtell
>            Priority: Major
>
> For each store open we create a CompletionService and also create a thread 
> pool for opening and closing store files. See HStore#openStoreFiles and 
> HRegion#getStoreFileOpenAndCloseThreadPool. By default this pool has only one 
> thread. It can be increased with "hbase.hstore.open.and.close.threads.max" 
> but this config value is then divided by number of stores in the region.
> "hbase.hstore.open.and.close.threads.max" is also used to size other thread 
> pools for opening and closing the stores themselves, so it's an unfortunate 
> overloading.
> We should have a configuration parameter that directly and simply tunes the 
> thread pool size for opening store files. Introduce a new configuration 
> parameter: "hbase.hstore.hfile.open.threads.max" which will define the upper 
> bound for a thread pool shared by the entire store for opening hfiles. The 
> default should be 1 to preserve default behavior.
> Once this is done, we could increase this to 2, 4, 8, or more for increased 
> parallelism when opening store files without impact on other activities. The 
> time required to open all storefiles often dominates the total time for 
> bringing a region online. The thread pool will be shut down and eligible for 
> garbage collection once all files are loaded and the store is online.
> Number of open threads should scale with the number of stores, so allocating 
> the pool at the store level continues to make sense.
> Longer term we might try recursively decomposing the region open task with a 
> fork-join pool such that the opening of store files can be dynamically 
> parallelized in a probably superior way (conjecture pending a real attempt 
> with metrics) . 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to