[ 
https://issues.apache.org/jira/browse/IMPALA-14164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17985986#comment-17985986
 ] 

ASF subversion and git services commented on IMPALA-14164:
----------------------------------------------------------

Commit 1d0b2ef0c593145c476d52610c0f4ec2c69c8be7 in impala's branch 
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=1d0b2ef0c ]

IMPALA-14164: Fix timeout for fragments in flight in TestScratchDir

On release builds, some tests in TestScratchDir have started hitting
a timeout waiting for num-fragments-in-flight to reach 2. The code
to wait for the metric sleeps one second between samples. If one of
the query fragments starts and finishes during that second, the test
will never see a sample containing two in-flight fragments. This
happens on release builds because they are faster and more likely to
complete within that second.

This removes the code that waits for num-fragments-in-flight. All the
tests have subsequent calls waiting for the scratch usage to reach a
certain value. This will properly wait for the fragment to start up
on its own. The num-fragments-in-flight wait doesn't add anything.

Testing:
 - Ran custom_cluster/test_scratch_disk.py multiple times with a
   release build

Change-Id: Ic8c573affc033056ba465c42bd420d5c1d3ba15c
Reviewed-on: http://gerrit.cloudera.org:8080/23081
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> TestScratchDir tests do not reach expect number of fragments in flight
> ----------------------------------------------------------------------
>
>                 Key: IMPALA-14164
>                 URL: https://issues.apache.org/jira/browse/IMPALA-14164
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Test
>    Affects Versions: Impala 5.0.0
>            Reporter: Joe McDonnell
>            Assignee: Joe McDonnell
>            Priority: Blocker
>              Labels: broken-build
>
> Some TestScratchDir tests are failing with these symptoms:
>  
> {noformat}
> custom_cluster/test_scratch_disk.py:238: in test_scratch_dirs_default_priority
>     verifier.wait_for_metric("impala-server.num-fragments-in-flight", 2)
> verifiers/metric_verifier.py:67: in wait_for_metric
>     self.impalad_service.wait_for_metric_value(metric_name, expected_value, 
> timeout)
> common/impala_service.py:158: in wait_for_metric_value
>     self.__metric_timeout_assert(metric_name, expected_value, timeout, value)
> common/impala_service.py:227: in __metric_timeout_assert
>     assert 0, assert_string
> E   AssertionError: Metric impala-server.num-fragments-in-flight did not 
> reach value 2 in 60s. Actual value was '1'.{noformat}
> The logs show it never reaches 2:
>  
>  
> {noformat}
> -- 2025-06-15 13:10:20,680 INFO     MainThread: Getting metric: 
> impala-server.num-fragments-in-flight from 
> impala-ec2-redhat86-m6i-4xlarge-ondemand-1c73.vpc.cloudera.com:25000
> -- 2025-06-15 13:10:20,692 INFO     MainThread: Waiting for metric value 
> 'impala-server.num-fragments-in-flight'=2. Current value: 0. total_wait: 0s
> -- 2025-06-15 13:10:20,692 INFO     MainThread: Sleeping 1s before next retry.
> -- 2025-06-15 13:10:21,693 INFO     MainThread: Getting metric: 
> impala-server.num-fragments-in-flight from 
> impala-ec2-redhat86-m6i-4xlarge-ondemand-1c73.vpc.cloudera.com:25000
> -- 2025-06-15 13:10:21,704 INFO     MainThread: Waiting for metric value 
> 'impala-server.num-fragments-in-flight'=2. Current value: 1. total_wait: 
> 1.01228308678s
> -- 2025-06-15 13:10:21,704 INFO     MainThread: Sleeping 1s before next retry.
> ...
> -- 2025-06-15 13:11:20,955 INFO     MainThread: Metric 
> impala-server.num-fragments-in-flight did not reach value 2 in 60s. Actual 
> value was '1'. total_wait: 60.2740471363s. Failing...{noformat}
> This impacts these tests:
>  
>  
> {noformat}
> TestScratchDir.test_scratch_dirs_default_priority
> TestScratchDir.test_scratch_dirs_prioritized_spill
> TestScratchDir.test_scratch_dirs_mix_local_and_remote_dir_spill_local_only{noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to