Scan will not honor operation timeout configuration as its logic is a bit
different compared to normal read/write operations.

For scan, usually there is no simple 'retry'(except the open scanner call),
if you hit an error, usually you need to restart the scan by making a new
open scanner call, not retry on the scanner next call.

IIRC we have a special hbase.client.scanner.timeout.period and also a
special hbase.rpc.timeout for meta?

Thanks.

Bryan Beaudreault <[email protected]> 于2022年6月1日周三 00:47写道:

> Hi all,
>
> We just had a production issue where a user-facing API service had a low
> hbase.rpc.timeout, and this majorly contributed to a meta hotspotting
> issue. The issue is, user requests can only be submitted once the necessary
> RegionLocation is in the MetaCache. But in a meta hotspotting scenario it
> may be impossible to return a RegionLocation for hbase:meta in a timely
> manner. This will trigger the rpc timeout, which may result in a number of
> retries. This retry storm (across many client instances) can further
> exacerbate meta hotspotting issues.
>
> My thought is to decouple meta rpc timeout from user rpc timeouts, because
> generally you would prefer to allow a longer meta request to succeed
> because it may unblock many user requests.
>
> I think our current timeouts for meta scans are a bit confusing. There's
> a hbase.client.meta.operation.timeout, but actually that does not apply to
> meta scans. Instead they are configured via hbase.rpc.timeout
> and hbase.client.scanner.timeout.period.
>
> I was considering special casing meta scans so that they are configured via
> (new) hbase.client.meta.rpc.timeout and (existing)
> hbase.client.meta.operation.timeout. This would be different from typical
> scan requests, but may be more intuitive overall? Does anyone have any
> opinions?
>
> See https://issues.apache.org/jira/browse/HBASE-27078
>

Reply via email to