[
https://issues.apache.org/jira/browse/HBASE-15398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15197514#comment-15197514
]
Phil Yang commented on HBASE-15398:
-----------------------------------
{quote}
In your list, #3 is the optional. #1 and #2 are required/fundamentals.
{quote}
I see, so we can ban this kind of filter when can not guarantee two
fundamentals, right?
I checked the code and run a small test locally, as long as a filter's
hasFilterRow() return true, we will not return partial results. This is
reasonable because this kind of filter should see the whole row when
filterRowCells or filterRow.
> Cells loss or disorder when using family essential filter and partial
> scanning protocol
> ---------------------------------------------------------------------------------------
>
> Key: HBASE-15398
> URL: https://issues.apache.org/jira/browse/HBASE-15398
> Project: HBase
> Issue Type: Bug
> Components: dataloss, Scanners
> Affects Versions: 1.2.0, 1.1.3
> Reporter: Phil Yang
> Assignee: Phil Yang
> Priority: Critical
> Attachments: 15398-test.txt, HBASE-15398-v2.patch,
> HBASE-15398-v3.patch, HBASE-15398-v4.patch, HBASE-15398.v1.txt
>
>
> In RegionScannerImpl, we have two heaps, storeHeap and joinedHeap. If we have
> a filter and it doesn't apply to all cf, the stores whose families needn't be
> filtered will be in joinedHeap. We scan storeHeap first, then joinedHeap,
> and merge the results and sort and return to client. We need sort because the
> order of Cell is rowkey/cf/cq/ts and a smaller cf may be in the joinedHeap.
> However, after HBASE-11544 we may transfer partial results when we get
> SIZE_LIMIT_REACHED_MID_ROW or other similar states. We may return a larger cf
> first because it is in storeHeap and then a smaller cf because it is in
> joinedHeap. Server won't hold all cells in a row and client doesn't have a
> sorting logic. The order of cf in Result for user is wrong.
> And a more critical bug is, if we get a LIMIT_REACHED_MID_ROW on the last
> cell of a row in storeHeap, we will break scanning in RegionScannerImpl and
> in populateResult we will change the state to SIZE_LIMIT_REACHED because next
> peeked cell is next row. But this is only the last cell of one and we have
> two... And SIZE_LIMIT_REACHED means this Result is not partial (by
> ScannerContext.partialResultFormed), client will see it and merge them and
> return to user with losing data of joinedHeap. On next scan we will read next
> row of storeHeap and joinedHeap is forgotten and never be read...
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)