Andrew Kyle Purtell created HBASE-24445:
-------------------------------------------
Summary: Improve default thread pool size for opening store files
Key: HBASE-24445
URL: https://issues.apache.org/jira/browse/HBASE-24445
Project: HBase
Issue Type: Improvement
Reporter: Andrew Kyle Purtell
For each store open we create a CompletionService and also create a thread pool
for opening and closing store files. See HStore#openStoreFiles and
HRegion#getStoreFileOpenAndCloseThreadPool. By default this pool has only one
thread. It can be increased with "hbase.hstore.open.and.close.threads.max" but
this config value is then divided by number of stores in the region.
"hbase.hstore.open.and.close.threads.max" is also used to size other thread
pools for opening and closing the stores themselves, so it's an unfortunate
overloading.
We should have a configuration parameter that directly and simply tunes the
thread pool size for opening store files. Introduce a new configuration
parameter: "hbase.hstore.hfile.open.threads.max" which will define the upper
bound for a thread pool shared by the entire store for opening hfiles. The
default should be 1 to preserve default behavior.
Once this is done, we could increase this to 2, 4, 8, or more for increased
parallelism when opening store files without impact on other activities. The
time required to open all storefiles often dominates the total time for
bringing a region online. The thread pool will be shut down and eligible for
garbage collection once all files are loaded and the store is online.
Number of open threads should scale with the number of stores, so allocating
the pool at the store level continues to make sense.
Longer term we might try recursively decomposing the region open task with a
fork-join pool such that the opening of store files can be dynamically
parallelized in a probably superior way (conjecture pending a real attempt with
metrics) .
--
This message was sent by Atlassian Jira
(v8.3.4#803005)