Hi Tim, Thanks for the fix, applying it solved the 2 timed-out assertion failures mentioned earlier.
I'm executing these tests on a VM having following configuration: OS: Ubuntu15.10 Architecture: ppc64le RAM: 110GB HDD: 210GB CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 1 Core(s) per socket: 1 Regards, Valencia From: Tim Armstrong <[email protected]> To: Valencia Serrao/Austin/Contr/IBM@IBMUS Cc: [email protected], Manish Patil/Austin/Contr/IBM@IBMUS, Sudarshan Jagadale/Austin/Contr/IBM@IBMUS, Nishidha Panpaliya/Austin/Contr/IBM@IBMUS Date: 06/24/2016 09:17 PM Subject: Re: Custom cluster test failure in test_admission_controller.py Hi Valencia, It does look like it's timing-related failure, but maybe a different one to IMPALA-3772. You could try applying this fix we have in review https://gerrit.cloudera.org/#/c/3450/1 It's curious that there are all these timing failures. Are you running on a small VM or something like that? We typically run the tests on a fairly modest 2-core VM and we don't generally see these tests failing. There are a few tests that we know will fail on very slow machines or build types, we've used "@SkipIfBuildType.not_dev_build" and "specific_build_type_timeout" to deal with some of those cases. On Fri, Jun 24, 2016 at 12:07 AM, Valencia Serrao <[email protected]> wrote: Hi Tim, I am seeing 'timed out' assertions for 2 custom cluster tests in test_admission_controller.py: tests test_admission_controller_with_flags and test_admission_controller_with_configs. Putting a debug statements at line number: 512 in test_admission_controller.py code as follows: def run(self): client = None try: try: ............. except ImpalaBeeswaxException as e: if "Rejected" in str(e): ............ elif "exceeded timeout" in str(e): LOG.debug("Query %s timed out", self.query_num) self.query_state = 'TIMED OUT' print "Query " + self.query_state //added this line return else: raise e finally: .................. I found that queries in both test cases is getting time out. Query TIMED OUT Query TIMED OUT Metrics printed in logs are as follows: Final Metric: {'dequeued': 13, 'rejected': 0, 'released': 28, 'admitted': 28, 'queued': 15, 'timed-out': 2} The assertion is similar to the one mentioned in JIRA: IMPALA-3772. Is this issue similar to the one you mentioned earlier in this thread ? Regards, Valencia Inactive hide details for Nishidha Panpaliya---06/24/2016 11:56:15 AM---Thanks a lot Tim. We tried running the query on impala Nishidha Panpaliya---06/24/2016 11:56:15 AM---Thanks a lot Tim. We tried running the query on impala shell after starting impala cluster with give From: Nishidha Panpaliya/Austin/Contr/IBM To: Tim Armstrong <[email protected]> Cc: [email protected], Manish Patil/Austin/Contr/IBM@IBMUS, Sudarshan Jagadale/Austin/Contr/IBM@IBMUS, Valencia Serrao/Austin/Contr/IBM@IBMUS Date: 06/24/2016 11:56 AM Subject: Re: Custom cluster test failure in test_exchange_delays.py Thanks a lot Tim. We tried running the query on impala shell after starting impala cluster with given parameters. But the query is still passing. So, we just tried changing the delay to 20000 and we got the expected exception. Same thing is verified in the test case too by changing test argument for delay. But as you said, if the problem is timing sensitive and it is seen on other platforms too, we would not change the test case (to increase delay) just to make it pass. We can ignore the failure. Thanks again, Nishidha Inactive hide details for Tim Armstrong ---06/23/2016 10:30:19 PM---Hmm, that test is potentially timing sensitive. We've seen Tim Armstrong ---06/23/2016 10:30:19 PM---Hmm, that test is potentially timing sensitive. We've seen problems when running with slow builds (e From: Tim Armstrong <[email protected]> To: [email protected] Cc: Sudarshan Jagadale/Austin/Contr/IBM@IBMUS, Valencia Serrao/Austin/Contr/IBM@IBMUS, Manish Patil/Austin/Contr/IBM@IBMUS, Nishidha Panpaliya/Austin/Contr/IBM@IBMUS Date: 06/23/2016 10:30 PM Subject: Re: Custom cluster test failure in test_exchange_delays.py Hmm, that test is potentially timing sensitive. We've seen problems when running with slow builds (e.g. code coverage) or running it on a particularly slow machine? E.g. single-core VM. It's probably ok to skip the test on PowerPC if this is the case. The query is expected to fail, but in this case no failure is happening. It's a "custom cluster test" that configures the cluster in a way that queries will fail with a timeout. It's test coverage for a bug where if the timeout happens Impala returned incorrect results. If you run the query on Impala with the default startup arguments it should succeed. If you start up Impala with the special configuration used by those tests, it should fail. E.g. locally I get: tarmstrong@tarmstrong-box:~/Impala/Impala$ ./bin/start-impala-cluster.py --impalad_args=--datastream_sender_timeout_ms=5000 --impalad_args=--stress_datastream_recvr_delay_ms=10000 Starting State Store logging to /home/tarmstrong/Impala/Impala/logs/cluster/statestored.INFO Starting Catalog Service logging to /home/tarmstrong/Impala/Impala/logs/cluster/catalogd.INFO Starting Impala Daemon logging to /home/tarmstrong/Impala/Impala/logs/cluster/impalad.INFO Starting Impala Daemon logging to /home/tarmstrong/Impala/Impala/logs/cluster/impalad_node1.INFO Starting Impala Daemon logging to /home/tarmstrong/Impala/Impala/logs/cluster/impalad_node2.INFO MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) MainThread: Getting num_known_live_backends from tarmstrong-box:25000 MainThread: Waiting for num_known_live_backends=3. Current value: 0 MainThread: Getting num_known_live_backends from tarmstrong-box:25000 MainThread: Waiting for num_known_live_backends=3. Current value: 0 MainThread: Getting num_known_live_backends from tarmstrong-box:25000 MainThread: Waiting for num_known_live_backends=3. Current value: 2 MainThread: Getting num_known_live_backends from tarmstrong-box:25000 MainThread: Waiting for num_known_live_backends=3. Current value: 2 MainThread: Getting num_known_live_backends from tarmstrong-box:25000 MainThread: num_known_live_backends has reached value: 3 Waiting for Catalog... Status: 63 DBs / 1091 tables (ready=True) MainThread: Getting num_known_live_backends from tarmstrong-box:25001 MainThread: num_known_live_backends has reached value: 3 Waiting for Catalog... Status: 63 DBs / 1091 tables (ready=True) MainThread: Getting num_known_live_backends from tarmstrong-box:25002 MainThread: num_known_live_backends has reached value: 3 Waiting for Catalog... Status: 63 DBs / 1091 tables (ready=True) Impala Cluster Running with 3 nodes. tarmstrong@tarmstrong-box:~/Impala/Impala$ impala-shell.sh Starting Impala Shell without Kerberos authentication Connected to tarmstrong-box.ca.cloudera.com:21000 Server version: impalad version 2.6.0-cdh5-INTERNAL DEBUG (build fe23dbf0465220a0c40a5c8431cb6a536e19dc6b) *********************************************************************************** Welcome to the Impala shell. Copyright (c) 2015 Cloudera, Inc. All rights reserved. (Impala Shell v2.6.0-cdh5-INTERNAL (fe23dbf) built on Fri May 13 11:15:16 PDT 2016) You can run a single query from the command line using the '-q' option. *********************************************************************************** [tarmstrong-box.ca.cloudera.com:21000] > select count(*) > from tpch.lineitem > inner join tpch.orders on l_orderkey = o_orderkey > ; Query: select count(*) from tpch.lineitem inner join tpch.orders on l_orderkey = o_orderkey WARNINGS: Sender timed out waiting for receiver fragment instance: 4cbdf04962743c:faa6717f926b5183 (1 of 2 similar) You could try increasing the delay on your setup to see if you can replicate the failure. On Thu, Jun 23, 2016 at 3:54 AM, Nishidha Panpaliya <[email protected]> wrote: Hi All, On power8, we are getting 3 failures in custom cluster test failure. 2 test cases failed in test_admission_controller.py and 1 in test_exchange_delays.py. I investigated the test failure in test_exchange_delays.py and below is my finding. Test case failed is "test_exchange_small_delay". This test has input test file as "QueryTest/exchange-delays", --stress_datastream_recvr_delay_ms=10000 and --datastream_sender_timeout_ms=5000. The test is expected to throw an exception, with message as mentioned in CATCH section in QueryTest/exchange-delays. However, at our end, the query in this test does not throw any exception, but since QueryTest/exchange-delays has CATCH section mentioned, the test case fails due to assertion in tests/common/impala_test_suite.py as below - if 'CATCH' in test_section: assert test_section['CATCH'].strip() == '' 4. If I remove CATCH section from exchange-delays.test file, then this test case passes, however, another test case in the same test file fails, as it throws exception as per inputs given to it but CATCH section is missing. 5. On another RHEL ppc machine, this test randomly passes i.e. both the test cases throws exception as expected. I'm really confused as to what parameter is leading the test case "test_exchange_small_delay" to not throw any exception in my setup. Or what should actually be happened? I checked latest cdh5-trunk code on github and it also has same test code and same content in query test file. Kindly provide me some pointers. Thanks, Nishidha
