[ 
https://issues.apache.org/jira/browse/HBASE-10017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13839681#comment-13839681
 ] 

Enis Soztutar commented on HBASE-10017:
---------------------------------------

bq. I have reproduced data loss during bulk load. This happens under the same 
conditions as initial bug. 16 regions per table, I think it's not the only 
case. Again, partitioner wrongly maps last region data and resulting region 
HFile contains keys that shall not appear there.
This partitioner is not intended to be used by bulk load. It is already there 
in the javadoc.  TotalOrderPartioner should be used instead. If there are 
changes to regions, LoadIncrementalFiles checks the boundaries (although not 
sure whether it handles multiple splits to the same range or merges). 

Other than that, the changes seems ok. However, I think we should get the 
region boundaries at the start, and treat the range as immutable for the 
lifetime of the partitioner. Although the table regions might go underlying 
changes, we can at least guarantee a consistent mapping for key ranges. We can 
to a table.getStartKeys() and do a binary search for the key range considering 
the special region boundaries (empty start and stop rows). 

> HRegionPartitioner, rows directed to last partition are wrongly mapped.
> -----------------------------------------------------------------------
>
>                 Key: HBASE-10017
>                 URL: https://issues.apache.org/jira/browse/HBASE-10017
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.94.6
>            Reporter: Roman Nikitchenko
>            Priority: Critical
>         Attachments: HBASE-10017-r1544633.patch, HBASE-10017-r1544633.patch, 
> patchSiteOutput.txt
>
>
> Inside HRegionPartitioner class there is getPartition() method which should 
> map first numPartitions regions to appropriate partitions 1:1. But based on 
> condition last region is hashed which could lead to last reducer not having 
> any data. This is considered serious issue.
> I reproduced this only starting from 16 regions per table. Original defect 
> was found in 0.94.6 but at least today's trunk and 0.91 branch head have the 
> same HRegionPartitioner code in this part which means the same issue.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to