Nathan Schile created HBASE-15428:
-------------------------------------
Summary: Port auto-balancing to MultiTableInputFormatBase
Key: HBASE-15428
URL: https://issues.apache.org/jira/browse/HBASE-15428
Project: HBase
Issue Type: Improvement
Components: mapreduce
Reporter: Nathan Schile
Assignee: Nathan Schile
Apache Crunch currently uses
[MultiTableInputFormatBase|https://github.com/apache/crunch/blob/apache-crunch-0.13.0/crunch-hbase/src/main/java/org/apache/crunch/io/hbase/HBaseSourceTarget.java#L88]
as the default format for reading HBase data. I would like to use the
functionality provided by
[HBASE-12590|https://issues.apache.org/jira/browse/HBASE-12590] "A solution for
data skew in HBase-Mapreduce Job", however it is only available in
TableInputFormatBase. This JIRA is to port the changes from
TableInputFormatBase into MultiTableInputFormatBase with respect toa
[HBASE-12590|https://issues.apache.org/jira/browse/HBASE-12590].
I would use to use the [TableInputFormatBase#calculateRebalancedSplits
|https://github.com/apache/hbase/blob/rel/1.2.0/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java#L381]
and
[TableInputFormatBase#getSplitKey|https://github.com/apache/hbase/blob/rel/1.2.0/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java#L454]
methods from TableInputFormatBase. Is it ok to use those methods directly from
MultiTableInputFormatBase, or should I move them to a new class? I can submit
a patch once I get direction on the above question.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)