[
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15887066#comment-15887066
]
huaxiang sun commented on HBASE-17125:
--------------------------------------
Hi [~zghaobac], "But this idea has another problem, if a column's max version
is 5 and the user query only need 3 versions. It first check the version's
number, then check the cell by filter. So the cells number of the result may
less than 3. But there are 2 versions which don't read anymore." I think this
is caused by the different meaning of scan.maxVersions and HCD's maxVersions?
The implementation implies that they are same. Thanks.
>From doc:
Scan setMaxVersions(int maxVersions)
Get up to the specified number of versions of each column.
The implementation:
https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/querymatcher/UserScanQueryMatcher.java#L197
> Inconsistent result when use filter to read data
> ------------------------------------------------
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
> Issue Type: Bug
> Reporter: Guanghao Zhang
> Attachments: example.diff
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column.
> The oldest version doesn't remove immediately. But from the user view, the
> oldest version has gone. When user use a filter to query, if the filter skip
> a new version, then the oldest version will be seen again. But after compact
> the region, then the oldest version will never been seen. So it is weird for
> user. The query will get inconsistent result before and after region
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the
> cell by filter, then check the number of versions needed. So if the filter
> skip the new version, then the oldest version will be seen again when it is
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two
> solution for this problem. The first idea is check the number of versions
> first, then check the cell by filter. As the comment of setFilter, the filter
> is called after all tests for ttl, column match, deletes and max versions
> have been run.
> {code}
> /**
> * Apply the specified server-side filter when performing the Query.
> * Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
> * for ttl, column match, deletes and max versions have been run.
> * @param filter filter to run on the server
> * @return this for invocation chaining
> */
> public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
> }
> {code}
> But this idea has another problem, if a column's max version is 5 and the
> user query only need 3 versions. It first check the version's number, then
> check the cell by filter. So the cells number of the result may less than 3.
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)