Re: Custom cluster test failure in test_admission_controller.py

Valencia Serrao Fri, 24 Jun 2016 00:08:18 -0700

Hi Tim,

I am seeing 'timed out' assertions for 2 custom cluster tests in
test_admission_controller.py: tests test_admission_controller_with_flags
and test_admission_controller_with_configs. Putting a debug statements at
line number: 512 in test_admission_controller.py code as follows:

 def run(self):
      client = None
      try:
        try:
         .............
        except ImpalaBeeswaxException as e:
          if "Rejected" in str(e):
            ............
          elif "exceeded timeout" in str(e):
            LOG.debug("Query %s timed out", self.query_num)
            self.query_state = 'TIMED OUT'
            print "Query " + self.query_state                   //added this
line
            return
          else:
            raise e
        finally:
         ..................

I found that queries in both test cases is getting time out.
Query TIMED OUT
Query TIMED OUT

Metrics printed in logs are as follows:
Final Metric:  {'dequeued': 13, 'rejected': 0, 'released': 28, 'admitted':
28, 'queued': 15, 'timed-out': 2}

The assertion is similar to the one mentioned in JIRA: IMPALA-3772.

Is this issue similar to the one you mentioned earlier in this thread ?

Regards,
Valencia

From:   Nishidha Panpaliya/Austin/Contr/IBM
To:     Tim Armstrong <[email protected]>
Cc:     [email protected], Manish
            Patil/Austin/Contr/IBM@IBMUS, Sudarshan
            Jagadale/Austin/Contr/IBM@IBMUS, Valencia
            Serrao/Austin/Contr/IBM@IBMUS
Date:   06/24/2016 11:56 AM
Subject:        Re: Custom cluster test failure in test_exchange_delays.py

Thanks a lot Tim.

We tried running the query on impala shell after starting impala cluster
with given parameters. But the query is still passing. So, we just tried
changing the delay to 20000 and we got the expected exception. Same thing
is verified in the test case too by changing test argument for delay.

But as you said, if the problem is timing sensitive and it is seen on other
platforms too, we would not change the test case (to increase delay) just
to make it pass. We can ignore the failure.

Thanks again,
Nishidha

From:   Tim Armstrong <[email protected]>
To:     [email protected]
Cc:     Sudarshan Jagadale/Austin/Contr/IBM@IBMUS, Valencia
            Serrao/Austin/Contr/IBM@IBMUS, Manish
            Patil/Austin/Contr/IBM@IBMUS, Nishidha
            Panpaliya/Austin/Contr/IBM@IBMUS
Date:   06/23/2016 10:30 PM
Subject:        Re: Custom cluster test failure in test_exchange_delays.py

Hmm, that test is potentially timing sensitive. We've seen problems when
running with slow builds (e.g. code coverage) or running it on a
particularly slow machine? E.g. single-core VM. It's probably ok to skip
the test on PowerPC if this is the case.

The query is expected to fail, but in this case no failure is happening.
It's a "custom cluster test" that configures the cluster in a way that
queries will fail with a timeout. It's test coverage for a bug where if the
timeout happens Impala returned incorrect results.

If you run the query on Impala with the default startup arguments it should
succeed.

If you start up Impala with the special configuration used by those tests,
it should fail. E.g. locally I get:

tarmstrong@tarmstrong-box:~/Impala/Impala$ ./bin/start-impala-cluster.py
--impalad_args=--datastream_sender_timeout_ms=5000
--impalad_args=--stress_datastream_recvr_delay_ms=10000
Starting State Store logging
to /home/tarmstrong/Impala/Impala/logs/cluster/statestored.INFO
Starting Catalog Service logging
to /home/tarmstrong/Impala/Impala/logs/cluster/catalogd.INFO
Starting Impala Daemon logging
to /home/tarmstrong/Impala/Impala/logs/cluster/impalad.INFO
Starting Impala Daemon logging
to /home/tarmstrong/Impala/Impala/logs/cluster/impalad_node1.INFO
Starting Impala Daemon logging
to /home/tarmstrong/Impala/Impala/logs/cluster/impalad_node2.INFO
MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
MainThread: Getting num_known_live_backends from tarmstrong-box:25000
MainThread: Waiting for num_known_live_backends=3. Current value: 0
MainThread: Getting num_known_live_backends from tarmstrong-box:25000
MainThread: Waiting for num_known_live_backends=3. Current value: 0
MainThread: Getting num_known_live_backends from tarmstrong-box:25000
MainThread: Waiting for num_known_live_backends=3. Current value: 2
MainThread: Getting num_known_live_backends from tarmstrong-box:25000
MainThread: Waiting for num_known_live_backends=3. Current value: 2
MainThread: Getting num_known_live_backends from tarmstrong-box:25000
MainThread: num_known_live_backends has reached value: 3
Waiting for Catalog... Status: 63 DBs / 1091 tables (ready=True)
MainThread: Getting num_known_live_backends from tarmstrong-box:25001
MainThread: num_known_live_backends has reached value: 3
Waiting for Catalog... Status: 63 DBs / 1091 tables (ready=True)
MainThread: Getting num_known_live_backends from tarmstrong-box:25002
MainThread: num_known_live_backends has reached value: 3
Waiting for Catalog... Status: 63 DBs / 1091 tables (ready=True)
Impala Cluster Running with 3 nodes.
tarmstrong@tarmstrong-box:~/Impala/Impala$ impala-shell.sh
Starting Impala Shell without Kerberos authentication
Connected to tarmstrong-box.ca.cloudera.com:21000
Server version: impalad version 2.6.0-cdh5-INTERNAL DEBUG (build
fe23dbf0465220a0c40a5c8431cb6a536e19dc6b)
***********************************************************************************

Welcome to the Impala shell. Copyright (c) 2015 Cloudera, Inc. All rights
reserved.
(Impala Shell v2.6.0-cdh5-INTERNAL (fe23dbf) built on Fri May 13 11:15:16
PDT 2016)

You can run a single query from the command line using the '-q' option.
***********************************************************************************

[tarmstrong-box.ca.cloudera.com:21000] > select count(*)
                                       > from tpch.lineitem
                                       >   inner join tpch.orders on
l_orderkey = o_orderkey
                                       > ;
Query: select count(*)
from tpch.lineitem
  inner join tpch.orders on l_orderkey = o_orderkey
WARNINGS:

Sender timed out waiting for receiver fragment instance:
4cbdf04962743c:faa6717f926b5183

 (1 of 2 similar)

You could try increasing the delay on your setup to see if you can
replicate the failure.

On Thu, Jun 23, 2016 at 3:54 AM, Nishidha Panpaliya <[email protected]>
wrote:

  Hi All,

  On power8, we are getting 3 failures in custom cluster test failure. 2
  test
  cases failed in test_admission_controller.py and 1 in
  test_exchange_delays.py. I investigated the test failure in
  test_exchange_delays.py and below is my finding.

     Test case failed is "test_exchange_small_delay". This test has input
     test file as "QueryTest/exchange-delays",
     --stress_datastream_recvr_delay_ms=10000 and
     --datastream_sender_timeout_ms=5000.
     The test is expected to throw an exception, with message as mentioned
  in
     CATCH section in QueryTest/exchange-delays.
     However, at our end, the query in this test does not throw any
     exception, but since QueryTest/exchange-delays has CATCH section
     mentioned, the test case fails due to assertion in
     tests/common/impala_test_suite.py as below -
              if 'CATCH' in test_section:
                assert test_section['CATCH'].strip() == ''
  4.      If I remove CATCH section from exchange-delays.test file, then
  this
  test case passes, however, another test case in the same test file fails,
  as it throws exception as per inputs given to it but CATCH section is
  missing.
  5.     On another RHEL ppc machine, this test randomly passes i.e. both
  the
  test cases throws exception as expected.

  I'm really confused as to what parameter is leading the test case
  "test_exchange_small_delay" to not throw any exception in my setup. Or
  what
  should actually be happened?
  I checked latest cdh5-trunk code on github and it also has same test code
  and same content in query test file.

  Kindly provide me some pointers.

  Thanks,
  Nishidha

Re: Custom cluster test failure in test_admission_controller.py

Reply via email to