[
https://issues.apache.org/jira/browse/HBASE-5140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Josh Wymer updated HBASE-5140:
------------------------------
Description:
In regards to [HBASE-5138|https://issues.apache.org/jira/browse/HBASE-5138] I
am working on a patch for the TableInputFormat class that overrides getSplits
in order to generate N number of splits per regions and/or N number of splits
per job. The idea is to convert the startKey and endKey for each region from
byte[] to BigDecimal, take the difference, divide by N, convert back to byte[]
and generate splits on the resulting values. Assuming your keys are fully
distributed this should generate splits at nearly the same number of rows per
split. Any suggestions on this issue are welcome.
was:In regards to
[HBASE-5138|https://issues.apache.org/jira/browse/HBASE-5138] I am working on a
subclass for the TableInputFormat class that overrides getSplits in order to
generate N number of splits per regions and/or N number of splits per job. The
idea is to convert the startKey and endKey for each region from byte[] to
BigDecimal, take the difference, divide by N, convert back to byte[] and
generate splits on the resulting values. Assuming your keys are fully
distributed this should generate splits at nearly the same number of rows per
split. Any suggestions on this issue are welcome.
> TableInputFormat subclass to allow N number of splits per region during MR
> jobs
> -------------------------------------------------------------------------------
>
> Key: HBASE-5140
> URL: https://issues.apache.org/jira/browse/HBASE-5140
> Project: HBase
> Issue Type: New Feature
> Components: mapreduce
> Affects Versions: 0.90.4
> Reporter: Josh Wymer
> Priority: Trivial
> Labels: mapreduce, split
> Fix For: 0.90.4
>
> Attachments:
> Added_functionality_to_TableInputFormat_that_allows_splitting_of_regions.patch,
>
> Added_functionality_to_TableInputFormat_that_allows_splitting_of_regions.patch.1,
> Added_functionality_to_split_n_times_per_region_on_mapreduce_jobs.patch
>
> Original Estimate: 72h
> Remaining Estimate: 72h
>
> In regards to [HBASE-5138|https://issues.apache.org/jira/browse/HBASE-5138] I
> am working on a patch for the TableInputFormat class that overrides getSplits
> in order to generate N number of splits per regions and/or N number of splits
> per job. The idea is to convert the startKey and endKey for each region from
> byte[] to BigDecimal, take the difference, divide by N, convert back to
> byte[] and generate splits on the resulting values. Assuming your keys are
> fully distributed this should generate splits at nearly the same number of
> rows per split. Any suggestions on this issue are welcome.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira