[ 
https://issues.apache.org/jira/browse/IMPALA-14784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18061402#comment-18061402
 ] 

ASF subversion and git services commented on IMPALA-14784:
----------------------------------------------------------

Commit 0adb0775368dca39aa253c0e11583df315354e15 in impala's branch 
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=0adb07753 ]

IMPALA-14784: Upgrade to python-xdist==3.5.0 and use --dist=worksteal

On exhaustive jobs, the end-to-end parallel tests show enormous
skew. The last 1-2% of tests takes hours, and logs indicate that
the last 1257 tests execute on a single worker.

pytest-xdist introduced a 'worksteal' algorithm in 3.2.0 that
can rebalance the work. Exhaustive end-to-end parallel tests
take about 5:20, while the same tests run in about 2:40 with
the worksteal policy. The improvement on core exhaustive
tests is much smaller, because it doesn't suffer the same
level of skew.

pytest-xdist changed the way they assign tests to workers,
and it exposed an issue with TestAcid::test_lock_timings().
The test sets the query option lock_max_wait_time_s on the
session, but it never unsets it. When multiple copies of
the test run on a single worker, the test case for a timeout
of 300 seconds with lock_max_wait_time_s unset is actually
using a value of lock_max_wait_time_s=5. This reworks the
test to set lock_max_wait_time_s via execute_query()'s
query_options argument rather than on the session itself.

Testing:
 - Ran end-to-end exhaustive tests
 - Ran a core job
 - Verified that TestAcid::test_lock_timings() can run multiple
   times with a single worker without failing

Change-Id: I6916bbef94b380a516356763dfabb3777c682637
Reviewed-on: http://gerrit.cloudera.org:8080/24035
Tested-by: Impala Public Jenkins <[email protected]>
Reviewed-by: Michael Smith <[email protected]>


> Switch end to end parallel tests to python-xdist's --dist=worksteal
> -------------------------------------------------------------------
>
>                 Key: IMPALA-14784
>                 URL: https://issues.apache.org/jira/browse/IMPALA-14784
>             Project: IMPALA
>          Issue Type: Task
>          Components: Infrastructure
>    Affects Versions: Impala 5.0.0
>            Reporter: Joe McDonnell
>            Assignee: Joe McDonnell
>            Priority: Major
>             Fix For: Impala 5.0.0
>
>
> Python-xdist added a "worksteal" distribution mode that can rebalance work 
> between the threads when a thread goes idle. This is mainly useful towards 
> the end of a parallel test run to reduce skew. On exhaustive end-to-end 
> parallel tests, the last 2% of tests take hours:
> {noformat}
> 02:26:10 
> ..............................xxx....................................... [ 
> 98%]
> 03:38:48 
> ........................................................................ [ 
> 98%]
> 04:42:39 
> ........................................................................ [ 
> 99%]
> 05:08:04 
> ........................................................................ [ 
> 99%]
> 05:08:21 ....ss                                                               
>     [100%]{noformat}
> Looking at the report log, there is massive skew. At the end, a single worker 
> is working alone on its remaining tests while other workers are idle. Work 
> stealing would make a big difference here.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to