[jira] [Commented] (HBASE-3721) Speedup LoadIncrementalHFiles

[email protected] (JIRA) Wed, 27 Apr 2011 23:47:48 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13026146#comment-13026146
 ]

[email protected] commented on HBASE-3721:
------------------------------------------------------

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/572/#review597
-----------------------------------------------------------

/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
<https://reviews.apache.org/r/572/#comment1242>

    I don't necessarily see the point of this being in HConnectionManager, 
since it's just a general submission of a bunch of callables.

    If HConnectionManager did something smarter so that threads sleeping for 
retry didn't lock up a thread to sleep, it would make more sense here.

/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
<https://reviews.apache.org/r/572/#comment1243>

    since these arrays only get filled in in the case that there are 
exceptions, we can expect they'll be very small in general, and the extra code 
to track actionCount isn't likely to make any difference. It's just a 
micro-optimization for something that isn't a bottleneck.

/src/main/java/org/apache/hadoop/hbase/client/HTable.java
<https://reviews.apache.org/r/572/#comment1244>

    why is this public?

/src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java
<https://reviews.apache.org/r/572/#comment1246>

    rather than working in batches, why not just have one thread which is 
submitting tasks to the executor, and another thread which is pulling off 
completed ones? ie I don't see the point of batch size at all instead of 
something more "fluid" - a normal producer/consumer kind of design.

/src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java
<https://reviews.apache.org/r/572/#comment1245>

    i don't like this dependency. Check out Guava's ThreadFactoryBuilder.

- Todd

On 2011-04-09 14:00:23, Ted Yu wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/572/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-04-09 14:00:23)
bq.  
bq.  
bq.  Review request for hbase and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  I refactored LoadIncrementalHFiles so that tryLoad() queues work items in 
List<ServerCallable<Void>>. doBulkLoad() periodically sends batch of 
ServerCallable's to HBase cluster.
bq.  I added the following method to HConnection/HConnectionManager:
bq.      public <T> void getRegionServerWithRetries(ExecutorService pool,
bq.          List<ServerCallable<T>> callables, Object[] results)
bq.  This method uses thread pool to send multiple ServerCallable's through 
getRegionServerWithRetries(ServerCallable<T> callable).
bq.  
bq.  I introduced two new config parameters: hbase.loadincremental.threads.max 
and hbase.loadincremental.batch.size
bq.  hbase.loadincremental.batch.size is for configuring the batch size above 
which HConnection.getRegionServerWithRetries() would be called. In Adam's case, 
there're many small HFiles. LoadIncrementalHFiles shouldn't wait until all 
HFiles have been scanned.
bq.  hbase.loadincremental.threads.max controls the maximum number of threads 
in thread pool.
bq.  
bq.  
bq.  This addresses bug HBASE-3721.
bq.      https://issues.apache.org/jira/browse/HBASE-3721
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    /src/main/java/org/apache/hadoop/hbase/client/HConnection.java 1090500 
bq.    /src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 
1090500 
bq.    /src/main/java/org/apache/hadoop/hbase/client/HTable.java 1090500 
bq.    
/src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java 
1090500 
bq.  
bq.  Diff: https://reviews.apache.org/r/572/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  TestLoadIncrementalHFiles and TestHFileOutputFormat pass.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.

> Speedup LoadIncrementalHFiles
> -----------------------------
>
>                 Key: HBASE-3721
>                 URL: https://issues.apache.org/jira/browse/HBASE-3721
>             Project: HBase
>          Issue Type: Improvement
>          Components: util
>            Reporter: Ted Yu
>            Assignee: Ted Yu
>         Attachments: 3721-v2.txt, 3721-v3.txt, 3721-v4.txt, 3721.txt
>
>
> From Adam Phelps:
> from the logs it looks like <1% of the hfiles we're loading have to be split. 
>  Looking at the code for LoadIncrementHFiles (hbase v0.90.1), I'm actually 
> thinking our problem is that this code loads the hfiles sequentially.  Our 
> largest table has over 2500 regions and the data being loaded is fairly well 
> distributed across them, so there end up being around 2500 HFiles for each 
> load period.  At 1-2 seconds per HFile that means the loading process is very 
> time consuming.
> Currently server.bulkLoadHFile() is a blocking call.
> We can utilize ExecutorService to achieve better parallelism on multi-core 
> computer.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3721) Speedup LoadIncrementalHFiles

Reply via email to