[
https://issues.apache.org/jira/browse/HBASE-1829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756385#action_12756385
]
Lars George commented on HBASE-1829:
------------------------------------
You are right Michael, it cleans up some remnants from when we could have
different numbers of splits. It also attempts to reduce the split count to the
number of regions that include start and stop row. The idea with the comparison
is to find the start key of the region just below the start row and the end key
of the region just after the stop row.
I am not sure about the default empty end row and also the comparison in terms
of equal or equal and greater etc. I just thought I get the patch up as an idea
I had but it is not yet tested. I will test it early next week an sort out the
issues.
Question is there a testbed that allows to have say 3-4 regions so that I can
construct various test cases (like start/stop row both in first/last region,
spanning all regions, crossing only two regions etc.)? I am not too familiar
with the test classes and I know you guys changing things around. What would be
a good sample to start with?
Otherwise I will test it on my live cluster that has more than enough to test
with. But a unit test seems like a good idea.
> Make use of start/stop row in TableInputFormat
> ----------------------------------------------
>
> Key: HBASE-1829
> URL: https://issues.apache.org/jira/browse/HBASE-1829
> Project: Hadoop HBase
> Issue Type: Improvement
> Components: mapred
> Affects Versions: 0.20.0
> Reporter: Lars George
> Assignee: Lars George
> Priority: Minor
> Fix For: 0.20.1
>
> Attachments: HBASE-1829.patch
>
>
> Since we can now specify a start and stop row with the Scan that is handed to
> the TIF we can reduce the splits to the regions that contain these rows. That
> allows to test large MR jobs on a single region for example.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.