[ 
https://issues.apache.org/jira/browse/HBASE-15398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15181748#comment-15181748
 ] 

Ted Yu commented on HBASE-15398:
--------------------------------

This is what I meant:
http://pastebin.com/mdLyUsc1

> Cells loss or disorder when using family essential filter and partial 
> scanning protocol
> ---------------------------------------------------------------------------------------
>
>                 Key: HBASE-15398
>                 URL: https://issues.apache.org/jira/browse/HBASE-15398
>             Project: HBase
>          Issue Type: Bug
>          Components: dataloss, Scanners
>    Affects Versions: 1.2.0, 1.1.3
>            Reporter: Phil Yang
>            Assignee: Phil Yang
>            Priority: Critical
>         Attachments: 15398-test.txt
>
>
> In RegionScannerImpl, we have two heaps, storeHeap and joinedHeap. If we have 
> a filter and it doesn't apply to all cf, the stores whose families needn't be 
>  filtered will be in joinedHeap. We scan storeHeap first, then joinedHeap, 
> and merge the results and sort and return to client. We need sort because the 
> order of Cell is rowkey/cf/cq/ts and a smaller cf may be in the joinedHeap.
> However, after HBASE-11544 we may transfer partial results when we get 
> SIZE_LIMIT_REACHED_MID_ROW or other similar states. We may return a larger cf 
> first because it is in storeHeap and then a smaller cf because it is in 
> joinedHeap. Server won't hold all cells in a row and client doesn't have a 
> sorting logic. The order of cf in Result for user is wrong.
> And a more critical bug is, if we get a LIMIT_REACHED_MID_ROW on the last 
> cell of a row in storeHeap, we will break scanning in RegionScannerImpl and 
> in populateResult we will change the state to SIZE_LIMIT_REACHED because next 
> peeked cell is next row. But this is only the last cell of one and we have 
> two... And SIZE_LIMIT_REACHED means this Result is not partial (by 
> ScannerContext.partialResultFormed), client will see it and merge them and 
> return to user with losing data of joinedHeap. On next scan we will read next 
> row of storeHeap and joinedHeap is forgotten and never be read...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to