[
https://issues.apache.org/jira/browse/HBASE-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028475#comment-13028475
]
[email protected] commented on HBASE-3721:
------------------------------------------------------
bq. On 2011-05-03 21:51:39, Michael Stack wrote:
bq. > Does it work? If it does, I'm good w/ applying it. There are some
questions in the below. See what you think Ted.
I ran unit tests (TestHFileOutputFormat and TestLoadIncrementalHFiles) on my
patch.
bq. On 2011-05-03 21:51:39, Michael Stack wrote:
bq. >
/src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java,
line 212
bq. > <https://reviews.apache.org/r/572/diff/2/?file=17706#file17706line212>
bq. >
bq. > Nothing is done w/ the result here. Should it be logged or
something?
The return type is Void.
I do log errors.
bq. On 2011-05-03 21:51:39, Michael Stack wrote:
bq. >
/src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java,
line 233
bq. > <https://reviews.apache.org/r/572/diff/2/?file=17706#file17706line233>
bq. >
bq. > There are a bunch of these in this patch... white space.
Will remove white spaces in next patch.
bq. On 2011-05-03 21:51:39, Michael Stack wrote:
bq. >
/src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java,
line 235
bq. > <https://reviews.apache.org/r/572/diff/2/?file=17706#file17706line235>
bq. >
bq. > Will multiple threads be trying to get a unique name at the same
time? Is this a good enough 'unique' name -- table name and incrementing
number? Is this per unique table-based name to isolate thread writes to the fs?
I changed regionCount to AtomicLong.
The unique name is to isolate writes to fs from different threads.
- Ted
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/572/#review639
-----------------------------------------------------------
On 2011-04-29 20:48:41, Ted Yu wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/572/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2011-04-29 20:48:41)
bq.
bq.
bq. Review request for hbase and Todd Lipcon.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. I refactored LoadIncrementalHFiles so that tryLoad() queues work items in
List<ServerCallable<Void>>. doBulkLoad() periodically sends batch of
ServerCallable's to HBase cluster.
bq. I added the following method to HConnection/HConnectionManager:
bq. public <T> void getRegionServerWithRetries(ExecutorService pool,
bq. List<ServerCallable<T>> callables, Object[] results)
bq. This method uses thread pool to send multiple ServerCallable's through
getRegionServerWithRetries(ServerCallable<T> callable).
bq.
bq. I introduced two new config parameters: hbase.loadincremental.threads.max
and hbase.loadincremental.batch.size
bq. hbase.loadincremental.batch.size is for configuring the batch size above
which HConnection.getRegionServerWithRetries() would be called. In Adam's case,
there're many small HFiles. LoadIncrementalHFiles shouldn't wait until all
HFiles have been scanned.
bq. hbase.loadincremental.threads.max controls the maximum number of threads
in thread pool.
bq.
bq.
bq. This addresses bug HBASE-3721.
bq. https://issues.apache.org/jira/browse/HBASE-3721
bq.
bq.
bq. Diffs
bq. -----
bq.
bq.
/src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java
1097897
bq.
bq. Diff: https://reviews.apache.org/r/572/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. TestLoadIncrementalHFiles and TestHFileOutputFormat pass.
bq.
bq.
bq. Thanks,
bq.
bq. Ted
bq.
bq.
> Speedup LoadIncrementalHFiles
> -----------------------------
>
> Key: HBASE-3721
> URL: https://issues.apache.org/jira/browse/HBASE-3721
> Project: HBase
> Issue Type: Improvement
> Components: util
> Reporter: Ted Yu
> Assignee: Ted Yu
> Attachments: 3721-v2.txt, 3721-v3.txt, 3721-v4.txt, 3721.txt
>
>
> From Adam Phelps:
> from the logs it looks like <1% of the hfiles we're loading have to be split.
> Looking at the code for LoadIncrementHFiles (hbase v0.90.1), I'm actually
> thinking our problem is that this code loads the hfiles sequentially. Our
> largest table has over 2500 regions and the data being loaded is fairly well
> distributed across them, so there end up being around 2500 HFiles for each
> load period. At 1-2 seconds per HFile that means the loading process is very
> time consuming.
> Currently server.bulkLoadHFile() is a blocking call.
> We can utilize ExecutorService to achieve better parallelism on multi-core
> computer.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira