[
https://issues.apache.org/jira/browse/HBASE-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
stack resolved HBASE-3721.
--------------------------
Resolution: Fixed
Fix Version/s: 0.92.0
Hadoop Flags: [Reviewed]
Committed to TRUNK. Thanks for the patch Ted (Thanks Adam for testing).
> Speedup LoadIncrementalHFiles
> -----------------------------
>
> Key: HBASE-3721
> URL: https://issues.apache.org/jira/browse/HBASE-3721
> Project: HBase
> Issue Type: Improvement
> Components: util
> Reporter: Ted Yu
> Assignee: Ted Yu
> Fix For: 0.92.0
>
> Attachments: 3721-v2.txt, 3721-v3.txt, 3721-v4.txt, 3721-v6.patch,
> 3721.txt, LoadIncrementalHFiles.java
>
>
> From Adam Phelps:
> from the logs it looks like <1% of the hfiles we're loading have to be split.
> Looking at the code for LoadIncrementHFiles (hbase v0.90.1), I'm actually
> thinking our problem is that this code loads the hfiles sequentially. Our
> largest table has over 2500 regions and the data being loaded is fairly well
> distributed across them, so there end up being around 2500 HFiles for each
> load period. At 1-2 seconds per HFile that means the loading process is very
> time consuming.
> Currently server.bulkLoadHFile() is a blocking call.
> We can utilize ExecutorService to achieve better parallelism on multi-core
> computer.
> New configuration parameter "hbase.loadincremental.threads.max" is introduced
> which sets the maximum number of threads for parallel bulk load.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira