[
https://issues.apache.org/jira/browse/HBASE-10017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13839703#comment-13839703
]
Nick Dimiduk commented on HBASE-10017:
--------------------------------------
Multiple splits are handled through retrying. Splits are made and the halves
rewritten as independent HFiles with each pass, so this should be okay.
[~rn] I'm very concerned about the bulkload data loss issue, but I cannot
reproduce it using our existing unit tests (TestHRegionServerBulkLoad). Are you
able to demonstrate the loss in a test? As [~enis] said, TOP should be used for
generating HFiles files. Bulkload itself isn't performed inside a mapreduce
job, so I'm confused about how the HRegionPartitioner comes into play in this
scenario.
> HRegionPartitioner, rows directed to last partition are wrongly mapped.
> -----------------------------------------------------------------------
>
> Key: HBASE-10017
> URL: https://issues.apache.org/jira/browse/HBASE-10017
> Project: HBase
> Issue Type: Bug
> Components: mapreduce
> Affects Versions: 0.94.6
> Reporter: Roman Nikitchenko
> Priority: Critical
> Attachments: HBASE-10017-r1544633.patch, HBASE-10017-r1544633.patch,
> patchSiteOutput.txt
>
>
> Inside HRegionPartitioner class there is getPartition() method which should
> map first numPartitions regions to appropriate partitions 1:1. But based on
> condition last region is hashed which could lead to last reducer not having
> any data. This is considered serious issue.
> I reproduced this only starting from 16 regions per table. Original defect
> was found in 0.94.6 but at least today's trunk and 0.91 branch head have the
> same HRegionPartitioner code in this part which means the same issue.
--
This message was sent by Atlassian JIRA
(v6.1#6144)