Charles Connell created HBASE-28792:
---------------------------------------

             Summary: AsyncTableImpl calls coprocessor callbacks in undefined 
order
                 Key: HBASE-28792
                 URL: https://issues.apache.org/jira/browse/HBASE-28792
             Project: HBase
          Issue Type: Bug
          Components: Client
            Reporter: Charles Connell


To call a coprocessor endpoint asynchronously, you start by calling 
{{AsyncTable#coprocessorService()}}, which gives you a 
{{CoprocessorServiceBuilder}}, and a few steps later you can talk to your 
coprocessor over the network. One argument to 
{{AsyncTable#coprocessorService()}} is a {{CoprocessorCallback}} object, which 
contains several methods that will be called during the lifecycle of a 
coprocessor endpoint call. {{AsyncTableImpl}}'s implementation of 
{{AsyncTable#coprocessorService()}} wraps your {{CoprocessorCallback}} with its 
own that delegates the work to a thread pool. A snippet of this:
{code}
      @Override
      public void onRegionComplete(RegionInfo region, R resp) {
        pool.execute(context.wrap(() -> callback.onRegionComplete(region, 
resp)));
      }
...
      @Override
      public void onComplete() {
        pool.execute(context.wrap(callback::onComplete));
      }
{code}
The trouble with this is that your implementations of {{onRegionComplete()}} 
and {{onComplete()}} will end up getting called in a random order, and/or at 
the same time. The tasks of calling them is delegated to a thread pool, and the 
completion of those tasks is not waited on, so the thread pool can choose any 
ordering it wants to. Troublingly, {{onComplete()}} can be called before the 
final {{onRegionComplete()}}, which is an violation of the contract specified 
in the {{CoprocessorCallback#onComplete()}} javadoc.

I discovered this while working on HBASE-28770. I found that 
{{AsyncAggregationClient#rowCount()}} returns incorrect results 5-10% of the 
time, and this bug is the reason. Other {{AsyncAggregationClient}} methods I 
presume are similarly affected.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to