[ 
https://issues.apache.org/jira/browse/HBASE-27227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17649066#comment-17649066
 ] 

Bryan Beaudreault commented on HBASE-27227:
-------------------------------------------

I've been looking at this again. My initial thought was to find a way to 
release the buffers as we iterate over filtered rows. Now I'm thinking we might 
be able to solve two issues here, with an improvement to 
ScannerContext/progress.

Currently we keep dataSize and heapSize progress in ScannerContext 
ProgressFields. This only gets incremented for included cells.

I think we should add a blockSize field to progress, and increment that for all 
cells. We'd need some small changes in StoreScanner and RegionScannerImpl.

This solves two problems because another issue I realized is that currently 
Quotas only get enforced based on the size of the response. It really should 
account for the total IO cost of a query. Along with adding blockSize to 
ScannerContext ProgressFields, we'd also add it to LimitFields.

This way, all of the checkSizeLimit checks we already have would immediately be 
able to enforce max sizes against block IO. This would solve the original issue 
here because a heavily filtered scan would be forced to return early if it 
filters too many cells. The client side would automatically handle this like it 
does with existing scan limits.

> Long running heavily filtered scans hold up too many ByteBuffAllocator buffers
> ------------------------------------------------------------------------------
>
>                 Key: HBASE-27227
>                 URL: https://issues.apache.org/jira/browse/HBASE-27227
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Bryan Beaudreault
>            Priority: Major
>         Attachments: Screen Shot 2022-07-20 at 10.52.40 AM.png
>
>
> We have a workload which is launching long running scans searching for a 
> needle in a haystack. They have a timeout of 60s, so are allowed to run on 
> the server for 30s. Most of the rows are filtered, and the final result is 
> usually only a few kb.
> When these scans are running, we notice our ByteBuffAllocator pool usage goes 
> to 100% and we start seeing 100+ MB/s of heap allocations. When the scans 
> finish, the pool goes back to normal and heap allocations go away.
> My working theory here is that we are only releasing ByteBuff's once we call 
> {{shipper.shipped(),}} which only happens once a response is returned to the 
> user. This works fine for normal scans which are likely to quickly find 
> enough results to return, but for long running scans in which most of the 
> results are filtered we end up holding on to more and more buffers until the 
> scan finally returns.
> We should consider whether it's possible to release buffers for blocks whose 
> cells have been completely skipped by a scan.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to