Tim Armstrong created IMPALA-6564:
-------------------------------------

             Summary: Queries randomly fail with "CANCELLED" due to a race with 
IssueInitialRanges()
                 Key: IMPALA-6564
                 URL: https://issues.apache.org/jira/browse/IMPALA-6564
             Project: IMPALA
          Issue Type: Bug
          Components: Backend
    Affects Versions: Impala 2.12.0
            Reporter: Tim Armstrong
            Assignee: Tim Armstrong


I've been chasing a flaky test that I saw in test_basic_runtime_filters when 
running against https://gerrit.cloudera.org/#/c/8966/ (the scanner buffer pool 
changes).

I think it is a latent bug that has started reproducing more frequently. What 
I've found is:
* Different queries fail with CANCELLED. I can repro it on my branch ~3/4 times 
by running: impala-py.test tests/query_test/test_runtime_filters.py -n8 
--verbose --maxfail 1 -k basic . It happens with a variety of queries and file 
formats.
* It seems to happen when all files are pruned out by runtime filters
* Logging reveals IssueInitialRanges() fails with a CANCELLED status, which 
propagates up to the query status:
{code}
  if (!initial_ranges_issued_) {
    // We do this in GetNext() to maximise the amount of work we can do while 
waiting for
    // runtime filters to show up. The scanner threads have already started (in 
Open()),
    // so we need to tell them there is work to do.
    // TODO: This is probably not worth splitting the organisational cost of 
splitting
    // initialisation across two places. Move to before the scanner threads 
start.
    Status status = IssueInitialScanRanges(state);
    if (!status.ok()) LOG(INFO) << runtime_state_->fragment_instance_id() << " 
IssueInitialRanges() failed with status: " << status.GetDetail()  << " " << 
(void*) this;
{code}
* It appears that the CANCELLED comes from DiskIoMgr::AddScanRanges().
* That function returned cancelled because a scanner thread noticed that the 
scan was complete here and cancelled the RequestContext:
{code}
    // Done with range and it completed successfully
    if (progress_.done()) {
      // All ranges are finished.  Indicate we are done.
      LOG(INFO) << runtime_state_->fragment_instance_id() << " All ranges done 
" << (void*) this;
      SetDone();
      break;
    }
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to