TableInputFormat subclass to allow N number of splits per region during MR jobs
-------------------------------------------------------------------------------

                 Key: HBASE-5140
                 URL: https://issues.apache.org/jira/browse/HBASE-5140
             Project: HBase
          Issue Type: New Feature
          Components: mapreduce
            Reporter: Josh Wymer
            Priority: Trivial


In regards to [HBASE-5138|https://issues.apache.org/jira/browse/HBASE-5138] I 
am working on a subclass for the TableInputFormat class that overrides 
getSplits in order to generate N number of splits per regions and/or N number 
of splits per job. The idea is to convert the startKey and endKey for each 
region from byte[] to BigDecimal, take the difference, divide by N, convert 
back to byte[] and generate splits on the resulting values. Assuming your keys 
are fully distributed this should generate splits at nearly the same number of 
rows per split. Any suggestions on this issue are welcome.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to