[jira] [Commented] (HBASE-27558) Scan quotas and limits should account for total block IO

Bryan Beaudreault (Jira) Fri, 06 Jan 2023 15:15:56 -0800


    [ 
https://issues.apache.org/jira/browse/HBASE-27558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17655600#comment-17655600
 ]


Bryan Beaudreault commented on HBASE-27558:
-------------------------------------------

As of HBASE-18294, ScannerContext dataSize and heapSize fields are almost 
identical. dataSize is “cell.getSerializedSize() + Bytes.{_}SIZEOF_INT”{_} per 
PrivateCellUtil.estimatedSerializedSizeOf. heapSize is 
"cell.getSerializedSize() + FIXED_OVERHEAD", per all of the cell 
implementations of that method. The fixed overhead will often be on the order 
of 50-60 bytes depending on the extra fields in each object. It seems sort of 
pointless to have 2 such similar values, and from a read perspective the 
heapSize is actually incorrect.

On the server side, the actual memory retained for a read must include the 
actual length of the block(s) backing those cells. The full blocks are held in 
memory until the request is finished and they are released. So for 
ScannerContext I suggest we increment heapSize by cell.heapSize() - 
cell.getSerializedSize(). We’d also increment it by blockSize for each block 
loaded (and retained) during the request.

Additionally, will add a new "blockSize" field to ScannerContext which will be 
incremented for all blocks read during the request (not just retained). The 
difference between this and heapSize would depend on how much of requested 
blocks were able to be released early due to filters (see HBASE-27227)

> Scan quotas and limits should account for total block IO
> --------------------------------------------------------
>
>                 Key: HBASE-27558
>                 URL: https://issues.apache.org/jira/browse/HBASE-27558
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Bryan Beaudreault
>            Priority: Major
>
> Scan and Multi requests pull the byte throughput limit from 
> Quotas.getReadAvailable(). Multis validate the result inline in 
> RSRpcServices, by checking the accumulated 
> {{RpcCallContext.getResponseCellSize}} and {{getResponseBlockSize}} against 
> the read available after each action. Scans make use of 
> {{{}ScannerContext{}}}, and only checks the total cell serialized size and 
> {{{}cell.heapSize(){}}}.
> The handling for Multis was added in HBASE-14978. The block size is checked 
> because regardless of the actual cell size, the regionserver needs to retain 
> entire blocks backing those cells for the lifetime of a request. If the 
> retained blocks grows too large, a regionserver can OOM or experience heavy 
> GC pressure.
> So multis validate read available against the actual block size retained for 
> the responses, but scans only account for cell sizes. We should extend the 
> same block support to scans through ScannerContext tracking block bytes 
> scanned.
> Large scans can read over ranges of both returned and filtered rows. Despite 
> what's returned the users, the server-side cost of the scan is just as 
> impacted by filtered rows as non-filtered.
> Both Scans and Multis take the Math.min of Quotas read available and 
> hbase.server.scanner.max.result.size. Scans further take the min of that and 
> Scan.setMaxResultSize.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HBASE-27558) Scan quotas and limits should account for total block IO

Reply via email to