TableInputFormat subclass to allow N number of splits per region during MR jobs
-------------------------------------------------------------------------------
Key: HBASE-5140
URL: https://issues.apache.org/jira/browse/HBASE-5140
Project: HBase
Issue Type: New Feature
Components: mapreduce
Reporter: Josh Wymer
Priority: Trivial
In regards to [HBASE-5138|https://issues.apache.org/jira/browse/HBASE-5138] I
am working on a subclass for the TableInputFormat class that overrides
getSplits in order to generate N number of splits per regions and/or N number
of splits per job. The idea is to convert the startKey and endKey for each
region from byte[] to BigDecimal, take the difference, divide by N, convert
back to byte[] and generate splits on the resulting values. Assuming your keys
are fully distributed this should generate splits at nearly the same number of
rows per split. Any suggestions on this issue are welcome.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira