[jira] [Updated] (HBASE-3721) Speedup LoadIncrementalHFiles

Ted Yu (JIRA) Wed, 04 May 2011 15:54:44 -0700

     [ 
https://issues.apache.org/jira/browse/HBASE-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ted Yu updated HBASE-3721:
--------------------------

    Description: 
>From Adam Phelps:
from the logs it looks like <1% of the hfiles we're loading have to be split.  
Looking at the code for LoadIncrementHFiles (hbase v0.90.1), I'm actually 
thinking our problem is that this code loads the hfiles sequentially.  Our 
largest table has over 2500 regions and the data being loaded is fairly well 
distributed across them, so there end up being around 2500 HFiles for each load 
period.  At 1-2 seconds per HFile that means the loading process is very time 
consuming.

Currently server.bulkLoadHFile() is a blocking call.
We can utilize ExecutorService to achieve better parallelism on multi-core 
computer.

New configuration parameter "hbase.loadincremental.threads.max" is introduced 
which sets the maximum number of threads for parallel bulk load.

  was:
>From Adam Phelps:
from the logs it looks like <1% of the hfiles we're loading have to be split.  
Looking at the code for LoadIncrementHFiles (hbase v0.90.1), I'm actually 
thinking our problem is that this code loads the hfiles sequentially.  Our 
largest table has over 2500 regions and the data being loaded is fairly well 
distributed across them, so there end up being around 2500 HFiles for each load 
period.  At 1-2 seconds per HFile that means the loading process is very time 
consuming.

Currently server.bulkLoadHFile() is a blocking call.
We can utilize ExecutorService to achieve better parallelism on multi-core 
computer.


> Speedup LoadIncrementalHFiles
> -----------------------------
>
>                 Key: HBASE-3721
>                 URL: https://issues.apache.org/jira/browse/HBASE-3721
>             Project: HBase
>          Issue Type: Improvement
>          Components: util
>            Reporter: Ted Yu
>            Assignee: Ted Yu
>         Attachments: 3721-v2.txt, 3721-v3.txt, 3721-v4.txt, 3721-v6.patch, 
> 3721.txt, LoadIncrementalHFiles.java
>
>
> From Adam Phelps:
> from the logs it looks like <1% of the hfiles we're loading have to be split. 
>  Looking at the code for LoadIncrementHFiles (hbase v0.90.1), I'm actually 
> thinking our problem is that this code loads the hfiles sequentially.  Our 
> largest table has over 2500 regions and the data being loaded is fairly well 
> distributed across them, so there end up being around 2500 HFiles for each 
> load period.  At 1-2 seconds per HFile that means the loading process is very 
> time consuming.
> Currently server.bulkLoadHFile() is a blocking call.
> We can utilize ExecutorService to achieve better parallelism on multi-core 
> computer.
> New configuration parameter "hbase.loadincremental.threads.max" is introduced 
> which sets the maximum number of threads for parallel bulk load.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3721) Speedup LoadIncrementalHFiles

Reply via email to