bbeaudreault opened a new pull request, #4940: URL: https://github.com/apache/hbase/pull/4940
I still need to add unit tests, but this works in a deployed cluster. Submitting early in case anyone wants to comment on the approach. The basic premise is that currently we only call Shipper.shipped() at the final stage of returning an RPC. This was considered the only place where it was safe to release the blocks, which is true for result-referenced blocks. If we can be sure that a block is not referenced by any returned cells, we should be able to release them early. Shipper will still do the final release of anything retained in `prevBlocks`. RegionScannerImpl and StoreScanner both scan rows at a time and enforce different levels of filters. StoreScanner enforces most of the per-cell filters with a hot loop over cells in the row for that store. At this level we can simply release the block if no cells are matched with qcode `INCLUDE*`. We do this by calling `heap.retainBlock()` when including a result, which sets a boolean in HFileReaderImpl. When HFileReaderImpl progresses to the next block, it will only add the old block to `prevBlocks` if `retainBlock()` had been called. Otherwise it releases it at that point. The RegionScannerImpl level enforces `filterRow()` filters, since it has the full set of cells for the row. If the row is deemed filtered, the `results` list is cleared. We handle this level by adding a checkpoint system. At the start of a RegionScannerImpl.next call, `checkpoint(START)` is called. This sets an index in HFileReaderImpl based on the current size of `prevBlocks`. Once RegionScannerImpl has decided to filter a row, we call `checkpoint(FILTERED)` which releases any blocks that had been added to `prevBlocks` in the creation of cleared result list. In order to ensure these don't have adverse affects on unexpected use-cases of HFileReaderImpl, the new behavior is only triggered once `checkpoint(State)` is called on a scanner. Prior to calling checkpoint, the old behavior will remain in place where all old blocks are added to prevBlocks and only released by calling `shipped()`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
