Tim Armstrong created IMPALA-6564:

             Summary: Queries randomly fail with "CANCELLED" due to a race with 
                 Key: IMPALA-6564
                 URL: https://issues.apache.org/jira/browse/IMPALA-6564
             Project: IMPALA
          Issue Type: Bug
          Components: Backend
    Affects Versions: Impala 2.12.0
            Reporter: Tim Armstrong
            Assignee: Tim Armstrong

I've been chasing a flaky test that I saw in test_basic_runtime_filters when 
running against https://gerrit.cloudera.org/#/c/8966/ (the scanner buffer pool 

I think it is a latent bug that has started reproducing more frequently. What 
I've found is:
* Different queries fail with CANCELLED. I can repro it on my branch ~3/4 times 
by running: impala-py.test tests/query_test/test_runtime_filters.py -n8 
--verbose --maxfail 1 -k basic . It happens with a variety of queries and file 
* It seems to happen when all files are pruned out by runtime filters
* Logging reveals IssueInitialRanges() fails with a CANCELLED status, which 
propagates up to the query status:
  if (!initial_ranges_issued_) {
    // We do this in GetNext() to maximise the amount of work we can do while 
waiting for
    // runtime filters to show up. The scanner threads have already started (in 
    // so we need to tell them there is work to do.
    // TODO: This is probably not worth splitting the organisational cost of 
    // initialisation across two places. Move to before the scanner threads 
    Status status = IssueInitialScanRanges(state);
    if (!status.ok()) LOG(INFO) << runtime_state_->fragment_instance_id() << " 
IssueInitialRanges() failed with status: " << status.GetDetail()  << " " << 
(void*) this;
* It appears that the CANCELLED comes from DiskIoMgr::AddScanRanges().
* That function returned cancelled because a scanner thread noticed that the 
scan was complete here and cancelled the RequestContext:
    // Done with range and it completed successfully
    if (progress_.done()) {
      // All ranges are finished.  Indicate we are done.
      LOG(INFO) << runtime_state_->fragment_instance_id() << " All ranges done 
" << (void*) this;

This message was sent by Atlassian JIRA

Reply via email to