Nathan Schile created HBASE-15428:
-------------------------------------

             Summary: Port auto-balancing to MultiTableInputFormatBase
                 Key: HBASE-15428
                 URL: https://issues.apache.org/jira/browse/HBASE-15428
             Project: HBase
          Issue Type: Improvement
          Components: mapreduce
            Reporter: Nathan Schile
            Assignee: Nathan Schile


Apache Crunch currently uses 
[MultiTableInputFormatBase|https://github.com/apache/crunch/blob/apache-crunch-0.13.0/crunch-hbase/src/main/java/org/apache/crunch/io/hbase/HBaseSourceTarget.java#L88]
 as the default format for reading HBase data. I would like to use the 
functionality provided by 
[HBASE-12590|https://issues.apache.org/jira/browse/HBASE-12590] "A solution for 
data skew in HBase-Mapreduce Job", however it is only available in 
TableInputFormatBase. This JIRA is to port the changes from 
TableInputFormatBase into MultiTableInputFormatBase with respect toa 
[HBASE-12590|https://issues.apache.org/jira/browse/HBASE-12590]. 

I would use to use the [TableInputFormatBase#calculateRebalancedSplits 
|https://github.com/apache/hbase/blob/rel/1.2.0/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java#L381]
 and 
[TableInputFormatBase#getSplitKey|https://github.com/apache/hbase/blob/rel/1.2.0/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java#L454]
 methods from TableInputFormatBase. Is it ok to use those methods directly from 
MultiTableInputFormatBase, or should I move them to a new class?  I can submit 
a patch once I get direction on the above question.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to