Hi Valencia, It does look like it's timing-related failure, but maybe a different one to IMPALA-3772. You could try applying this fix we have in review https://gerrit.cloudera.org/#/c/3450/1
It's curious that there are all these timing failures. Are you running on a small VM or something like that? We typically run the tests on a fairly modest 2-core VM and we don't generally see these tests failing. There are a few tests that we know will fail on very slow machines or build types, we've used "@SkipIfBuildType.not_dev_build" and "specific_build_type_timeout" to deal with some of those cases. On Fri, Jun 24, 2016 at 12:07 AM, Valencia Serrao <[email protected]> wrote: > Hi Tim, > > I am seeing 'timed out' assertions for 2 custom cluster tests in > test_admission_controller.py: tests *test_admission_controller_with_flags > *and *test_admission_controller_with_configs. *Putting a debug statements > at line number: 512 in test_admission_controller.py code as follows: > > def run(self): > client = None > try: > try: > ............. > except ImpalaBeeswaxException as e: > if "Rejected" in str(e): > ............ > elif "exceeded timeout" in str(e): > LOG.debug("Query %s timed out", self.query_num) > self.query_state = 'TIMED OUT' > *print "Query " + self.query_state *//added this line > return > else: > raise e > finally: > .................. > > > I found that queries in both test cases is getting time out. > Query TIMED OUT > Query TIMED OUT > > Metrics printed in logs are as follows: > Final Metric: {'dequeued': 13, 'rejected': 0, 'released': 28, 'admitted': > 28, 'queued': 15, 'timed-out': 2} > > > The assertion is similar to the one mentioned in JIRA: *IMPALA-3772* > <https://issues.cloudera.org/browse/IMPALA-3772>. > > Is this issue similar to the one you mentioned earlier in this thread ? > > Regards, > Valencia > > > > [image: Inactive hide details for Nishidha Panpaliya---06/24/2016 11:56:15 > AM---Thanks a lot Tim. We tried running the query on impala]Nishidha > Panpaliya---06/24/2016 11:56:15 AM---Thanks a lot Tim. We tried running the > query on impala shell after starting impala cluster with give > > From: Nishidha Panpaliya/Austin/Contr/IBM > To: Tim Armstrong <[email protected]> > Cc: [email protected], Manish Patil/Austin/Contr/IBM@IBMUS, > Sudarshan Jagadale/Austin/Contr/IBM@IBMUS, Valencia > Serrao/Austin/Contr/IBM@IBMUS > Date: 06/24/2016 11:56 AM > Subject: Re: Custom cluster test failure in test_exchange_delays.py > ------------------------------ > > > Thanks a lot Tim. > > We tried running the query on impala shell after starting impala cluster > with given parameters. But the query is still passing. So, we just tried > changing the delay to 20000 and we got the expected exception. Same thing > is verified in the test case too by changing test argument for delay. > > But as you said, if the problem is timing sensitive and it is seen on > other platforms too, we would not change the test case (to increase delay) > just to make it pass. We can ignore the failure. > > Thanks again, > Nishidha > > > [image: Inactive hide details for Tim Armstrong ---06/23/2016 10:30:19 > PM---Hmm, that test is potentially timing sensitive. We've seen]Tim > Armstrong ---06/23/2016 10:30:19 PM---Hmm, that test is potentially timing > sensitive. We've seen problems when running with slow builds (e > > From: Tim Armstrong <[email protected]> > To: [email protected] > Cc: Sudarshan Jagadale/Austin/Contr/IBM@IBMUS, Valencia > Serrao/Austin/Contr/IBM@IBMUS, Manish Patil/Austin/Contr/IBM@IBMUS, > Nishidha Panpaliya/Austin/Contr/IBM@IBMUS > Date: 06/23/2016 10:30 PM > Subject: Re: Custom cluster test failure in test_exchange_delays.py > ------------------------------ > > > > Hmm, that test is potentially timing sensitive. We've seen problems when > running with slow builds (e.g. code coverage) or running it on a > particularly slow machine? E.g. single-core VM. It's probably ok to skip > the test on PowerPC if this is the case. > > The query is expected to fail, but in this case no failure is happening. > It's a "custom cluster test" that configures the cluster in a way that > queries will fail with a timeout. It's test coverage for a bug where if the > timeout happens Impala returned incorrect results. > > If you run the query on Impala with the default startup arguments it > should succeed. > > If you start up Impala with the special configuration used by those tests, > it should fail. E.g. locally I get: > > tarmstrong@tarmstrong-box:~/Impala/Impala$ ./bin/start-impala-cluster.py > --impalad_args=--datastream_sender_timeout_ms=5000 > --impalad_args=--stress_datastream_recvr_delay_ms=10000 > Starting State Store logging to > /home/tarmstrong/Impala/Impala/logs/cluster/statestored.INFO > Starting Catalog Service logging to > /home/tarmstrong/Impala/Impala/logs/cluster/catalogd.INFO > Starting Impala Daemon logging to > /home/tarmstrong/Impala/Impala/logs/cluster/impalad.INFO > Starting Impala Daemon logging to > /home/tarmstrong/Impala/Impala/logs/cluster/impalad_node1.INFO > Starting Impala Daemon logging to > /home/tarmstrong/Impala/Impala/logs/cluster/impalad_node2.INFO > MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) > MainThread: Getting num_known_live_backends from tarmstrong-box:25000 > MainThread: Waiting for num_known_live_backends=3. Current value: 0 > MainThread: Getting num_known_live_backends from tarmstrong-box:25000 > MainThread: Waiting for num_known_live_backends=3. Current value: 0 > MainThread: Getting num_known_live_backends from tarmstrong-box:25000 > MainThread: Waiting for num_known_live_backends=3. Current value: 2 > MainThread: Getting num_known_live_backends from tarmstrong-box:25000 > MainThread: Waiting for num_known_live_backends=3. Current value: 2 > MainThread: Getting num_known_live_backends from tarmstrong-box:25000 > MainThread: num_known_live_backends has reached value: 3 > Waiting for Catalog... Status: 63 DBs / 1091 tables (ready=True) > MainThread: Getting num_known_live_backends from tarmstrong-box:25001 > MainThread: num_known_live_backends has reached value: 3 > Waiting for Catalog... Status: 63 DBs / 1091 tables (ready=True) > MainThread: Getting num_known_live_backends from tarmstrong-box:25002 > MainThread: num_known_live_backends has reached value: 3 > Waiting for Catalog... Status: 63 DBs / 1091 tables (ready=True) > Impala Cluster Running with 3 nodes. > tarmstrong@tarmstrong-box:~/Impala/Impala$ impala-shell.sh > Starting Impala Shell without Kerberos authentication > Connected to *tarmstrong-box.ca.cloudera.com:21000* > <http://tarmstrong-box.ca.cloudera.com:21000/> > Server version: impalad version 2.6.0-cdh5-INTERNAL DEBUG (build > fe23dbf0465220a0c40a5c8431cb6a536e19dc6b) > > *********************************************************************************** > Welcome to the Impala shell. Copyright (c) 2015 Cloudera, Inc. All rights > reserved. > (Impala Shell v2.6.0-cdh5-INTERNAL (fe23dbf) built on Fri May 13 11:15:16 > PDT 2016) > > You can run a single query from the command line using the '-q' option. > > *********************************************************************************** > [*tarmstrong-box.ca.cloudera.com:21000* > <http://tarmstrong-box.ca.cloudera.com:21000/>] > select count(*) > > from tpch.lineitem > > inner join tpch.orders on > l_orderkey = o_orderkey > > ; > Query: select count(*) > from tpch.lineitem > inner join tpch.orders on l_orderkey = o_orderkey > WARNINGS: > > Sender timed out waiting for receiver fragment instance: > 4cbdf04962743c:faa6717f926b5183 > > > > (1 of 2 similar) > > > You could try increasing the delay on your setup to see if you can > replicate the failure. > > > On Thu, Jun 23, 2016 at 3:54 AM, Nishidha Panpaliya <*[email protected]* > <[email protected]>> wrote: > > > Hi All, > > On power8, we are getting 3 failures in custom cluster test failure. 2 > test > cases failed in test_admission_controller.py and 1 in > test_exchange_delays.py. I investigated the test failure in > test_exchange_delays.py and below is my finding. > > Test case failed is "test_exchange_small_delay". This test has input > test file as "QueryTest/exchange-delays", > --stress_datastream_recvr_delay_ms=10000 and > --datastream_sender_timeout_ms=5000. > The test is expected to throw an exception, with message as > mentioned in > CATCH section in QueryTest/exchange-delays. > However, at our end, the query in this test does not throw any > exception, but since QueryTest/exchange-delays has CATCH section > mentioned, the test case fails due to assertion in > tests/common/impala_test_suite.py as below - > if 'CATCH' in test_section: > assert test_section['CATCH'].strip() == '' > 4. If I remove CATCH section from exchange-delays.test file, then > this > test case passes, however, another test case in the same test file > fails, > as it throws exception as per inputs given to it but CATCH section is > missing. > 5. On another RHEL ppc machine, this test randomly passes i.e. > both the > test cases throws exception as expected. > > I'm really confused as to what parameter is leading the test case > "test_exchange_small_delay" to not throw any exception in my setup. Or > what > should actually be happened? > I checked latest cdh5-trunk code on github and it also has same test > code > and same content in query test file. > > Kindly provide me some pointers. > > Thanks, > Nishidha > > > > >
