[jira] [Commented] (HBASE-28358) AsyncProcess inconsistent exception thrown for operation timeout

Duo Zhang (Jira) Fri, 26 Jul 2024 20:11:04 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-28358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17869064#comment-17869064
 ]


Duo Zhang commented on HBASE-28358:
-----------------------------------

Ping [~bbeaudreault].

There is also a related PR

https://github.com/apache/hbase/pull/6000

I think it is reasonable but since it is a behavior change, I think we need to 
discuss more.

Thanks.

> AsyncProcess inconsistent exception thrown for operation timeout
> ----------------------------------------------------------------
>
>                 Key: HBASE-28358
>                 URL: https://issues.apache.org/jira/browse/HBASE-28358
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Bryan Beaudreault
>            Priority: Major
>
> I'm not sure if I'll get to this, but wanted to log it as a known issue.
> AsyncProcess has a design where it breaks the batch into sub-batches based on 
> regionserver, then submits a callable per regionserver in a threadpool. In 
> the main thread, it calls waitUntilDone() with an operation timeout. If the 
> callables don't finish within the operation timeout, a SocketTimeoutException 
> is thrown. This exception is not very useful because it doesn't give you any 
> sense of how many calls were in progress, on which servers, or why it's 
> delayed.
> Recently we've been improving the adherence to operation timeout within the 
> callables themselves. The main driver here has been to ensure we don't 
> erroneously clear the meta cache for operation timeout related errors. So 
> we've added a new OperationTimeoutExceededException, which is thrown from 
> within the callables and does not cause a meta cache clear. The added benefit 
> is that if these bubble up to the caller, they are wrapped in 
> RetriesExhaustedWithDetailsException which includes a lot more info about 
> which server and which action is affected. 
> Now we've covered most but not all cases where operation timeout is exceeded. 
> So when exceeding operation timeout it's possible sometimes to see a 
> SocketTimeoutException from waitUntilDone, and sometimes see 
> OperationTimeoutExceededException from the callables. It will depend on which 
> one fails first. It may be nice to finish the swing here, ensuring that we 
> always throw OperationTimeoutExceededException from the callables.
> The main remaining case is in the call to locateRegion, which hits meta and 
> does not honor the call's operation timeout (instead meta operation timeout). 
> Resolving this would require some refactoring of 
> ConnectionImplementation.locateRegion to allow passing an operation timeout 
> and having that affect the userRegionLock and meta scan.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HBASE-28358) AsyncProcess inconsistent exception thrown for operation timeout

Reply via email to