[ 
https://issues.apache.org/jira/browse/HBASE-5140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181579#comment-13181579
 ] 

Josh Wymer commented on HBASE-5140:
-----------------------------------

Correct but for example on a table with one region, getStartEndKeys() returns 
two empty byte[]. The last region (or only region) for the table will return 
empty byte[] as the end key allowing the scan to scan to the end of the table. 
Therefore, we don't know the upper bound byte[] to use in order to determine 
the long (or int, etc) value we want to use for split calculations. So we must 
either have an efficient way to get the last key in this case or arbitrarily 
set the long to it's max value (since in any case nothing could be higher) and 
use that number to make the calculations. This obviously won't work for unbound 
data types like BigDecimal and is a partial solution at best.
                
> TableInputFormat subclass to allow N number of splits per region during MR 
> jobs
> -------------------------------------------------------------------------------
>
>                 Key: HBASE-5140
>                 URL: https://issues.apache.org/jira/browse/HBASE-5140
>             Project: HBase
>          Issue Type: New Feature
>          Components: mapreduce
>            Reporter: Josh Wymer
>            Priority: Trivial
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> In regards to [HBASE-5138|https://issues.apache.org/jira/browse/HBASE-5138] I 
> am working on a subclass for the TableInputFormat class that overrides 
> getSplits in order to generate N number of splits per regions and/or N number 
> of splits per job. The idea is to convert the startKey and endKey for each 
> region from byte[] to BigDecimal, take the difference, divide by N, convert 
> back to byte[] and generate splits on the resulting values. Assuming your 
> keys are fully distributed this should generate splits at nearly the same 
> number of rows per split. Any suggestions on this issue are welcome.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to