[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058430#comment-16058430
 ] 

Guanghao Zhang commented on HBASE-17125:
----------------------------------------

bq. If the filter decide to skip a version, then reduce the returned count in 
ColumnTracker.
This method is too trick. And it is easy to have bug. So I upload a new patch 
(checkReturnedVersions.patch) which use the second idea in the description.
It have three steps to match column.
1. check the column family's max versions.
2. check by filter
3. check the returned versions. (This can be set by user).

About the setFilter()'s javadoc. It says "called AFTER all tests
for ttl, column match, deletes and max versions have been run." Talked with 
[~yangzhe1991] and [~Apache9], we thought the max versions is easy to 
misunderstanding. Because the column family has a max versions config and user 
can set a max versions to scan. So in the new patch, I update the javadoc of 
setFilter() method. The new javadoc is "called AFTER all tests for ttl, column 
match, deletes and column family's max versions have been run". And add a new 
method setVersions() for scan, which means how many versions will be returned 
to user. And add a @deprecated mark for setMaxVersions() method. Thanks. 

> Inconsistent result when use filter to read data
> ------------------------------------------------
>
>                 Key: HBASE-17125
>                 URL: https://issues.apache.org/jira/browse/HBASE-17125
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Guanghao Zhang
>            Assignee: Guanghao Zhang
>            Priority: Critical
>             Fix For: 2.0.0
>
>         Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.checkReturnedVersions.patch, 
> HBASE-17125.master.no-specified-filter.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>    * Apply the specified server-side filter when performing the Query.
>    * Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>    * for ttl, column match, deletes and max versions have been run.
>    * @param filter filter to run on the server
>    * @return this for invocation chaining
>    */
>   public Query setFilter(Filter filter) {
>     this.filter = filter;
>     return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to