[
https://issues.apache.org/jira/browse/HBASE-15398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15180178#comment-15180178
]
Ted Yu commented on HBASE-15398:
--------------------------------
Server side, should we enforce that essential family filter and
setAllowPartialResults(true) cannot be used at the same time - considering
fixing this bug may take some time ?
> Cells loss or disorder when using family essential filter and partial
> scanning protocol
> ---------------------------------------------------------------------------------------
>
> Key: HBASE-15398
> URL: https://issues.apache.org/jira/browse/HBASE-15398
> Project: HBase
> Issue Type: Bug
> Components: dataloss, Scanners
> Affects Versions: 1.2.0, 1.1.3
> Reporter: Phil Yang
> Assignee: Phil Yang
> Priority: Critical
> Attachments: 15398-test.txt
>
>
> In RegionScannerImpl, we have two heaps, storeHeap and joinedHeap. If we have
> a filter and it doesn't apply to all cf, the stores whose families needn't be
> filtered will be in joinedHeap. We scan storeHeap first, then joinedHeap,
> and merge the results and sort and return to client. We need sort because the
> order of Cell is rowkey/cf/cq/ts and a smaller cf may be in the joinedHeap.
> However, after HBASE-11544 we may transfer partial results when we get
> SIZE_LIMIT_REACHED_MID_ROW or other similar states. We may return a larger cf
> first because it is in storeHeap and then a smaller cf because it is in
> joinedHeap. Server won't hold all cells in a row and client doesn't have a
> sorting logic. The order of cf in Result for user is wrong.
> And a more critical bug is, if we get a LIMIT_REACHED_MID_ROW on the last
> cell of a row in storeHeap, we will break scanning in RegionScannerImpl and
> in populateResult we will change the state to SIZE_LIMIT_REACHED because next
> peeked cell is next row. But this is only the last cell of one and we have
> two... And SIZE_LIMIT_REACHED means this Result is not partial (by
> ScannerContext.partialResultFormed), client will see it and merge them and
> return to user with losing data of joinedHeap. On next scan we will read next
> row of storeHeap and joinedHeap is forgotten and never be read...
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)