[ 
https://issues.apache.org/jira/browse/HBASE-10017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13839703#comment-13839703
 ] 

Nick Dimiduk commented on HBASE-10017:
--------------------------------------

Multiple splits are handled through retrying. Splits are made and the halves 
rewritten as independent HFiles with each pass, so this should be okay.

[~rn] I'm very concerned about the bulkload data loss issue, but I cannot 
reproduce it using our existing unit tests (TestHRegionServerBulkLoad). Are you 
able to demonstrate the loss in a test? As [~enis] said, TOP should be used for 
generating HFiles files. Bulkload itself isn't performed inside a mapreduce 
job, so I'm confused about how the HRegionPartitioner comes into play in this 
scenario.

> HRegionPartitioner, rows directed to last partition are wrongly mapped.
> -----------------------------------------------------------------------
>
>                 Key: HBASE-10017
>                 URL: https://issues.apache.org/jira/browse/HBASE-10017
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.94.6
>            Reporter: Roman Nikitchenko
>            Priority: Critical
>         Attachments: HBASE-10017-r1544633.patch, HBASE-10017-r1544633.patch, 
> patchSiteOutput.txt
>
>
> Inside HRegionPartitioner class there is getPartition() method which should 
> map first numPartitions regions to appropriate partitions 1:1. But based on 
> condition last region is hashed which could lead to last reducer not having 
> any data. This is considered serious issue.
> I reproduced this only starting from 16 regions per table. Original defect 
> was found in 0.94.6 but at least today's trunk and 0.91 branch head have the 
> same HRegionPartitioner code in this part which means the same issue.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to