[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16062150#comment-16062150 ]
Ted Yu commented on HBASE-17125: -------------------------------- {code} + final WAL wal = HBaseTestingUtility.createWal(TEST_UTIL.getConfiguration(), logDir, info); + this.region = TEST_UTIL.createLocalHRegion(info, htd, wal); {code} The above code relies on other test to initialize chunk creator. If you run the subtest alone, you would observe NPE like the following: {code} MemStoreLABImpl.getOrMakeChunk() line: 242 MemStoreLABImpl.copyCellInto(Cell) line: 118 MutableSegment(Segment).maybeCloneWithAllocator(Cell) line: 168 CompactingMemStore(AbstractMemStore).maybeCloneWithAllocator(Cell) line: 268 CompactingMemStore(AbstractMemStore).add(Cell, MemstoreSize) line: 107 CompactingMemStore(AbstractMemStore).add(Iterable<Cell>, MemstoreSize) line: 101 HStore.add(Iterable<Cell>, MemstoreSize) line: 711 HRegion.applyToMemstore(Store, List<Cell>, boolean, MemstoreSize) line: 4001 HRegion.applyFamilyMapToMemstore(Map<byte[],List<Cell>>, MemstoreSize) line: 3984 HRegion.doMiniBatchMutate(BatchOperation<?>) line: 3439 HRegion.batchMutate(BatchOperation<?>) line: 3131 HRegion.batchMutate(Mutation[], long, long) line: 3073 HRegion.batchMutate(Mutation[]) line: 3077 HRegion.doBatchMutate(Mutation) line: 3827 HRegion.put(Put) line: 2950 TestHRegion.testGetWithFilter() line: 2665 {code} The following would allow the subtest to run alone: {code} + ChunkCreator.initialize(MemStoreLABImpl.CHUNK_SIZE_DEFAULT, false, 0, 0, 0, null); + this.region = TEST_UTIL.createLocalHRegion(info, htd, wal); {code} > Inconsistent result when use filter to read data > ------------------------------------------------ > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug > Reporter: Guanghao Zhang > Assignee: Guanghao Zhang > Priority: Critical > Fix For: 3.0.0, 2.0.0-alpha-2 > > Attachments: 17125-slack-13.txt, example.diff, > HBASE-17125.master.001.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.003.patch, > HBASE-17125.master.004.patch, HBASE-17125.master.005.patch, > HBASE-17125.master.006.patch, HBASE-17125.master.007.patch, > HBASE-17125.master.008.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.010.patch, > HBASE-17125.master.011.patch, HBASE-17125.master.011.patch, > HBASE-17125.master.012.patch, HBASE-17125.master.013.patch, > HBASE-17125.master.014.patch, HBASE-17125.master.015.patch, > HBASE-17125.master.016.patch, HBASE-17125.master.017.patch, > HBASE-17125.master.checkReturnedVersions.patch, > HBASE-17125.master.no-specified-filter.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** > * Apply the specified server-side filter when performing the Query. > * Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests > * for ttl, column match, deletes and max versions have been run. > * @param filter filter to run on the server > * @return this for invocation chaining > */ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)