bbeaudreault opened a new pull request, #4940:
URL: https://github.com/apache/hbase/pull/4940

   I still need to add unit tests, but this works in a deployed cluster. 
Submitting early in case anyone wants to comment on the approach.
   
   The basic premise is that currently we only call Shipper.shipped() at the 
final stage of returning an RPC. This was considered the only place where it 
was safe to release the blocks, which is true for result-referenced blocks. If 
we can be sure that a block is not referenced by any returned cells, we should 
be able to release them early. Shipper will still do the final release of 
anything retained in `prevBlocks`.
   
   RegionScannerImpl and StoreScanner both scan rows at a time and enforce 
different levels of filters. StoreScanner enforces most of the per-cell filters 
with a hot loop over cells in the row for that store. At this level we can 
simply release the block if no cells are matched with qcode `INCLUDE*`. We do 
this by calling `heap.retainBlock()` when including a result, which sets a 
boolean in HFileReaderImpl. When HFileReaderImpl progresses to the next block, 
it will only add the old block to `prevBlocks` if `retainBlock()` had been 
called. Otherwise it releases it at that point.
   
   The RegionScannerImpl level enforces `filterRow()` filters, since it has the 
full set of cells for the row. If the row is deemed filtered, the `results` 
list is cleared. We handle this level by adding a checkpoint system. At the 
start of a RegionScannerImpl.next call, `checkpoint(START)` is called. This 
sets an index in HFileReaderImpl based on the current size of `prevBlocks`. 
Once RegionScannerImpl has decided to filter a row, we call 
`checkpoint(FILTERED)` which releases any blocks that had been added to 
`prevBlocks` in the creation of cleared result list.
   
   In order to ensure these don't have adverse affects on unexpected use-cases 
of HFileReaderImpl, the new behavior is only triggered once `checkpoint(State)` 
is called on a scanner. Prior to calling checkpoint, the old behavior will 
remain in place where all old blocks are added to prevBlocks and only released 
by calling `shipped()`.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to