[jira] [Commented] (HBASE-5140) TableInputFormat subclass to allow N number of splits per region during MR jobs

Hadoop QA (Commented) (JIRA) Mon, 09 Jan 2012 15:35:02 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-5140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13182930#comment-13182930
 ]


Hadoop QA commented on HBASE-5140:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12509974/Added_functionality_to_split_n_times_per_region_on_mapreduce_jobs.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified 
tests.
                        Please justify why no new tests are needed for this 
patch.
                        Also please list what manual steps were performed to 
verify this patch.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/711//console

This message is automatically generated.
                
> TableInputFormat subclass to allow N number of splits per region during MR 
> jobs
> -------------------------------------------------------------------------------
>
>                 Key: HBASE-5140
>                 URL: https://issues.apache.org/jira/browse/HBASE-5140
>             Project: HBase
>          Issue Type: New Feature
>          Components: mapreduce
>    Affects Versions: 0.90.4
>            Reporter: Josh Wymer
>            Priority: Trivial
>              Labels: mapreduce, split
>             Fix For: 0.90.4
>
>         Attachments: 
> Added_functionality_to_split_n_times_per_region_on_mapreduce_jobs.patch
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> In regards to [HBASE-5138|https://issues.apache.org/jira/browse/HBASE-5138] I 
> am working on a subclass for the TableInputFormat class that overrides 
> getSplits in order to generate N number of splits per regions and/or N number 
> of splits per job. The idea is to convert the startKey and endKey for each 
> region from byte[] to BigDecimal, take the difference, divide by N, convert 
> back to byte[] and generate splits on the resulting values. Assuming your 
> keys are fully distributed this should generate splits at nearly the same 
> number of rows per split. Any suggestions on this issue are welcome.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5140) TableInputFormat subclass to allow N number of splits per region during MR jobs

Reply via email to