[jira] [Commented] (HBASE-3871) Speedup LoadIncrementalHFiles by parallelizing HFile splitting

[email protected] (JIRA) Fri, 08 Jul 2011 18:56:45 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-3871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13062291#comment-13062291
 ]


[email protected] commented on HBASE-3871:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/704/
-----------------------------------------------------------

(Updated 2011-07-09 01:54:46.893626)


Review request for hbase and Michael Stack.


Changes
-------

Patch version 2 addresses Andrew's comment.
Entering the loop starting at line 202, queue is empty, waiting for HFile 
splitting to feed item(s) into the queue.
The previous patch may wait inappropriately long for the first HFile to 
complete splitting.
The second version limits the amount of time waiting for any particular HFile 
to complete splitting.


Summary
-------

This JIRA complements HBASE-3721 by parallelizing HFile splitting which was 
done in the main thread.

bq. From Adam w.r.t. HFile splitting:
There's actually a good number of messages of that type (HFile no longer fits 
inside a single region), unfortunately I didn't take a timestamp on just when I 
was running with the patched jars vs the regular ones, however from the logs I 
can say that this is occurring fairly regularly on this system. The cluster I 
tested this on is our backup cluster, the mapreduce jobs on our production 
cluster output HFiles which are copied to the backup and then loaded into HBase 
on both. Since the regions may be somewhat different on the backup cluster I 
would expect it to have to split somewhat regularly.


This addresses bug HBASE-3871.
    https://issues.apache.org/jira/browse/HBASE-3871


Diffs (updated)
-----

  /src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java 
1144493 

Diff: https://reviews.apache.org/r/704/diff


Testing
-------

TestHFileOutputFormat and TestLoadIncrementalHFiles passed with this patch.


Thanks,

Ted



> Speedup LoadIncrementalHFiles by parallelizing HFile splitting
> --------------------------------------------------------------
>
>                 Key: HBASE-3871
>                 URL: https://issues.apache.org/jira/browse/HBASE-3871
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>    Affects Versions: 0.90.2
>            Reporter: Ted Yu
>            Assignee: Ted Yu
>         Attachments: 3871.patch
>
>
> From Adam w.r.t. HFile splitting:
> There's actually a good number of messages of that type (HFile no longer fits 
> inside a single region), unfortunately I didn't take a timestamp on just when 
> I was running with the patched jars vs the regular ones, however from the 
> logs I can say that this is occurring fairly regularly on this system.  The 
> cluster I tested this on is our backup cluster, the mapreduce jobs on our 
> production cluster output HFiles which are copied to the backup and then 
> loaded into HBase on both.  Since the regions may be somewhat different on 
> the backup cluster I would expect it to have to split somewhat regularly.
> This JIRA complements HBASE-3721 by parallelizing HFile splitting which is 
> done in the main thread.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3871) Speedup LoadIncrementalHFiles by parallelizing HFile splitting

Reply via email to