[
https://issues.apache.org/jira/browse/IMPALA-10783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Smith resolved IMPALA-10783.
------------------------------------
Fix Version/s: Impala 4.1.0
Resolution: Fixed
> run_and_verify_query_cancellation_test flakiness and improper error handling
> in TestImpalaShell
> -----------------------------------------------------------------------------------------------
>
> Key: IMPALA-10783
> URL: https://issues.apache.org/jira/browse/IMPALA-10783
> Project: IMPALA
> Issue Type: Bug
> Affects Versions: Impala 4.0.0
> Reporter: Bikramjeet Vig
> Assignee: Bikramjeet Vig
> Priority: Major
> Labels: flaky-test
> Fix For: Impala 4.1.0
>
>
> Some tests in TestImpalaShell run impala-shell in a seperate process but
> don't handle the case where the test can fail and the impala-shell process
> can linger on.
> One such test run_and_verify_query_cancellation_test, failed due to flakiness
> and since it ran a query that returned a large result, the impala-shell
> process lingered on while fetching results. This caused the query to hold on
> to resources and starve the cluster of memory which caused other tests to
> fail due to not enough memory being available.
> The flakiness in run_and_verify_query_cancellation_test was:
> {noformat}
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/shell/test_shell_commandline.py:414:
> in test_query_cancellation_during_wait_to_finish
> self.run_and_verify_query_cancellation_test(vector, stmt, "RUNNING")
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/shell/test_shell_commandline.py:422:
> in run_and_verify_query_cancellation_test
> wait_for_query_state(vector, stmt, cancel_at_state)
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/tests/shell/util.py:330:
> in wait_for_query_state
> raise Exception(exc_text)
> E Exception: The found in flight query is not the one under test: set all
> {noformat}
> the test checked for running queries too fast while the impala-shell was
> starting up. the impala-shell runs "set all" when it starts which the test
> picked up and raised an error thinking it did find its query.
> The result of this lingering query caused other tests to fail and throw
> errors like:
> {noformat}
> query_test/test_tpcds_queries.py:107: in test_tpcds_q18a
> self.run_test_case(self.get_workload() + '-q18a', vector)
> common/impala_test_suite.py:678: in run_test_case
> result = exec_fn(query, user=test_section.get('USER', '').strip() or None)
> common/impala_test_suite.py:616: in __exec_in_impala
> result = self.__execute_query(target_impalad_client, query, user=user)
> common/impala_test_suite.py:936: in __execute_query
> return impalad_client.execute(query, user=user)
> common/impala_connection.py:205: in execute
> return self.__beeswax_client.execute(sql_stmt, user=user)
> beeswax/impala_beeswax.py:189: in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:367: in __execute_query
> self.wait_for_finished(handle)
> beeswax/impala_beeswax.py:388: in wait_for_finished
> raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
> E ImpalaBeeswaxException: ImpalaBeeswaxException:
> E Query aborted:Failed to get minimum memory reservation of 452.19 MB on
> daemon impala-ec2-centos74-m5-4xlarge-ondemand-191d.vpc.cloudera.com:27002
> for query 394b7f96d554f99c:6882496c00000000 due to following error: Failed to
> increase reservation by 452.19 MB because it would exceed the applicable
> reservation limit for the "Process" ReservationTracker:
> reservation_limit=10.20 GB reservation=9.91 GB used_reservation=0
> child_reservations=9.91 GB
> E The top 5 queries that allocated memory under this tracker are:
> E Query(fa4ece9474a3f865:1b284e6700000000): Reservation=9.60 GB
> ReservationLimit=9.60 GB OtherMemory=118.01 MB Total=9.71 GB Peak=9.71 GB
> E Query(534d07950247ae68:6f5a410d00000000): Reservation=123.50 MB
> ReservationLimit=9.60 GB OtherMemory=2.68 MB Total=126.18 MB Peak=317.02 MB
> E Query(2e4f087aa8263e23:e697d8e800000000): Reservation=50.81 MB
> ReservationLimit=9.60 GB OtherMemory=42.62 MB Total=93.43 MB Peak=173.74 MB
> E Query(6e459d892dfa5050:5959219b00000000): Reservation=28.88 MB
> ReservationLimit=9.60 GB OtherMemory=18.77 MB Total=47.64 MB Peak=53.11 MB
> E Query(ad455bea2e0adc64:2b0bbf3500000000): Reservation=17.94 MB
> ReservationLimit=9.60 GB OtherMemory=15.22 MB Total=33.16 MB Peak=163.99 MB
> E
> E
> E
> E
> E
> E Memory is likely oversubscribed. Reducing query concurrency or
> configuring admission control may help avoid this error.
> {noformat}
> Logs confirmed that fa4ece9474a3f865:1b284e6700000000 is the query id of the
> query that run_and_verify_query_cancellation_test ran.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]