Guanghao Zhang updated HBASE-17125:
       Resolution: Fixed
    Fix Version/s: 3.0.0
           Status: Resolved  (was: Patch Available)

> Inconsistent result when use filter to read data
> ------------------------------------------------
>                 Key: HBASE-17125
>                 URL: https://issues.apache.org/jira/browse/HBASE-17125
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Guanghao Zhang
>            Assignee: Guanghao Zhang
>            Priority: Critical
>             Fix For: 2.0.0, 3.0.0
>         Attachments: 17125-slack-13.txt, example.diff, 
> HBASE-17125.master.001.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.003.patch, 
> HBASE-17125.master.004.patch, HBASE-17125.master.005.patch, 
> HBASE-17125.master.006.patch, HBASE-17125.master.007.patch, 
> HBASE-17125.master.008.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.010.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.012.patch, HBASE-17125.master.013.patch, 
> HBASE-17125.master.014.patch, HBASE-17125.master.015.patch, 
> HBASE-17125.master.016.patch, HBASE-17125.master.017.patch, 
> HBASE-17125.master.018.patch, HBASE-17125.master.019.patch, 
> HBASE-17125.master.020.patch, HBASE-17125.master.020.patch, 
> HBASE-17125.master.021.patch, HBASE-17125.master.022.patch, 
> HBASE-17125.master.checkReturnedVersions.patch, 
> HBASE-17125.master.no-specified-filter.patch
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>    * Apply the specified server-side filter when performing the Query.
>    * Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>    * for ttl, column match, deletes and max versions have been run.
>    * @param filter filter to run on the server
>    * @return this for invocation chaining
>    */
>   public Query setFilter(Filter filter) {
>     this.filter = filter;
>     return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.

This message was sent by Atlassian JIRA

Reply via email to