[ 
https://issues.apache.org/jira/browse/IMPALA-10783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-10783.
------------------------------------
    Fix Version/s: Impala 4.1.0
       Resolution: Fixed

> run_and_verify_query_cancellation_test flakiness and improper error handling 
> in TestImpalaShell
> -----------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-10783
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10783
>             Project: IMPALA
>          Issue Type: Bug
>    Affects Versions: Impala 4.0.0
>            Reporter: Bikramjeet Vig
>            Assignee: Bikramjeet Vig
>            Priority: Major
>              Labels: flaky-test
>             Fix For: Impala 4.1.0
>
>
> Some tests in TestImpalaShell run impala-shell in a seperate process but 
> don't handle the case where the test can fail and the impala-shell process 
> can linger on.
> One such test run_and_verify_query_cancellation_test, failed due to flakiness 
> and since it ran a query that returned a large result, the impala-shell 
> process lingered on while fetching results. This caused the query to hold on 
> to resources and starve the cluster of memory which caused other tests to 
> fail due to not enough memory being available.
> The flakiness in run_and_verify_query_cancellation_test was:
> {noformat}
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/shell/test_shell_commandline.py:414:
>  in test_query_cancellation_during_wait_to_finish
>     self.run_and_verify_query_cancellation_test(vector, stmt, "RUNNING")
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/shell/test_shell_commandline.py:422:
>  in run_and_verify_query_cancellation_test
>     wait_for_query_state(vector, stmt, cancel_at_state)
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/shell/util.py:330:
>  in wait_for_query_state
>     raise Exception(exc_text)
> E   Exception: The found in flight query is not the one under test: set all
> {noformat}
> the test checked for running queries too fast while the impala-shell was 
> starting up. the impala-shell runs "set all" when it starts which the test 
> picked up and raised an error thinking it did find its query.
> The result of this lingering query caused other tests to fail and throw 
> errors like:
> {noformat}
> query_test/test_tpcds_queries.py:107: in test_tpcds_q18a
>     self.run_test_case(self.get_workload() + '-q18a', vector)
> common/impala_test_suite.py:678: in run_test_case
>     result = exec_fn(query, user=test_section.get('USER', '').strip() or None)
> common/impala_test_suite.py:616: in __exec_in_impala
>     result = self.__execute_query(target_impalad_client, query, user=user)
> common/impala_test_suite.py:936: in __execute_query
>     return impalad_client.execute(query, user=user)
> common/impala_connection.py:205: in execute
>     return self.__beeswax_client.execute(sql_stmt, user=user)
> beeswax/impala_beeswax.py:189: in execute
>     handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:367: in __execute_query
>     self.wait_for_finished(handle)
> beeswax/impala_beeswax.py:388: in wait_for_finished
>     raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> E    Query aborted:Failed to get minimum memory reservation of 452.19 MB on 
> daemon impala-ec2-centos74-m5-4xlarge-ondemand-191d.vpc.cloudera.com:27002 
> for query 394b7f96d554f99c:6882496c00000000 due to following error: Failed to 
> increase reservation by 452.19 MB because it would exceed the applicable 
> reservation limit for the "Process" ReservationTracker: 
> reservation_limit=10.20 GB reservation=9.91 GB used_reservation=0 
> child_reservations=9.91 GB
> E   The top 5 queries that allocated memory under this tracker are:
> E   Query(fa4ece9474a3f865:1b284e6700000000): Reservation=9.60 GB 
> ReservationLimit=9.60 GB OtherMemory=118.01 MB Total=9.71 GB Peak=9.71 GB
> E   Query(534d07950247ae68:6f5a410d00000000): Reservation=123.50 MB 
> ReservationLimit=9.60 GB OtherMemory=2.68 MB Total=126.18 MB Peak=317.02 MB
> E   Query(2e4f087aa8263e23:e697d8e800000000): Reservation=50.81 MB 
> ReservationLimit=9.60 GB OtherMemory=42.62 MB Total=93.43 MB Peak=173.74 MB
> E   Query(6e459d892dfa5050:5959219b00000000): Reservation=28.88 MB 
> ReservationLimit=9.60 GB OtherMemory=18.77 MB Total=47.64 MB Peak=53.11 MB
> E   Query(ad455bea2e0adc64:2b0bbf3500000000): Reservation=17.94 MB 
> ReservationLimit=9.60 GB OtherMemory=15.22 MB Total=33.16 MB Peak=163.99 MB
> E   
> E   
> E   
> E   
> E   
> E   Memory is likely oversubscribed. Reducing query concurrency or 
> configuring admission control may help avoid this error.
> {noformat}
> Logs confirmed that fa4ece9474a3f865:1b284e6700000000 is the query id of the 
> query that run_and_verify_query_cancellation_test ran.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to