Weichen Xu created SPARK-28483:
----------------------------------
Summary: Canceling a spark job using barrier mode but tasks still
being printing messages
Key: SPARK-28483
URL: https://issues.apache.org/jira/browse/SPARK-28483
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 2.4.3
Reporter: Weichen Xu
Reproduce code:
{code:java}
import time
from pyspark import BarrierTaskContext
n = 4
def task(x):
context = BarrierTaskContext.get()
this = next(x)
if (this % 2 == 0):
time.sleep(10000)
context.barrier()
return []
sc.setLogLevel("INFO")
sc.parallelize(list(range(n)), n).barrier().mapPartitions(task).collect(){code}
Get logging like:
```
19/02/05 01:07:42 INFO BarrierTaskContext: Task 3 from Stage 0(Attempt 0) has
entered the global sync, current barrier epoch is 0.
19/02/05 01:07:42 INFO BarrierTaskContext: Task 1 from Stage 0(Attempt 0) has
entered the global sync, current barrier epoch is 0.
19/02/05 01:07:47 INFO Executor: Executor is trying to kill task 2.0 in stage
0.0 (TID 2), reason: Stage cancelled
19/02/05 01:07:47 INFO Executor: Executor is trying to kill task 0.0 in stage
0.0 (TID 0), reason: Stage cancelled
19/02/05 01:07:47 INFO Executor: Executor is trying to kill task 1.0 in stage
0.0 (TID 1), reason: Stage cancelled
19/02/05 01:07:47 INFO Executor: Executor is trying to kill task 3.0 in stage
0.0 (TID 3), reason: Stage cancelled
19/02/05 01:07:50 WARN PythonRunner: Incomplete task 0.0 in stage 0 (TID 0)
interrupted: Attempting to kill Python Worker
19/02/05 01:07:50 INFO Executor: Executor killed task 0.0 in stage 0.0 (TID 0),
reason: Stage cancelled
19/02/05 01:07:50 WARN PythonRunner: Incomplete task 3.3 in stage 0 (TID 3)
interrupted: Attempting to kill Python Worker
19/02/05 01:07:50 INFO Executor: Executor killed task 3.0 in stage 0.0 (TID 3),
reason: Stage cancelled
19/02/05 01:07:50 WARN PythonRunner: Incomplete task 2.2 in stage 0 (TID 2)
interrupted: Attempting to kill Python Worker
19/02/05 01:07:50 INFO Executor: Executor killed task 2.0 in stage 0.0 (TID 2),
reason: Stage cancelled
19/02/05 01:07:50 WARN PythonRunner: Incomplete task 1.1 in stage 0 (TID 1)
interrupted: Attempting to kill Python Worker
19/02/05 01:07:50 INFO Executor: Executor killed task 1.0 in stage 0.0 (TID 1),
reason: Stage cancelled
19/02/05 01:08:42 INFO BarrierTaskContext: Task 3 from Stage 0(Attempt 0)
waiting under the global sync since 1549328862443, has been waiting for 60
seconds, current barrier epoch is 0.
19/02/05 01:08:42 INFO BarrierTaskContext: Task 1 from Stage 0(Attempt 0)
waiting under the global sync since 1549328862522, has been waiting for 60
seconds, current barrier epoch is 0.
19/02/05 01:09:42 INFO BarrierTaskContext: Task 3 from Stage 0(Attempt 0)
waiting under the global sync since 1549328862443, has been waiting for 120
seconds, current barrier epoch is 0.
19/02/05 01:09:42 INFO BarrierTaskContext: Task 1 from Stage 0(Attempt 0)
waiting under the global sync since 1549328862522, has been waiting for 120
seconds, current barrier epoch is 0.
19/02/05 01:10:42 INFO BarrierTaskContext: Task 3 from Stage 0(Attempt 0)
waiting under the global sync since 1549328862443, has been waiting for 180
seconds, current barrier epoch is 0.
19/02/05 01:10:42 INFO BarrierTaskContext: Task 1 from Stage 0(Attempt 0)
waiting under the global sync since 1549328862522, has been waiting for 180
seconds, current barrier epoch is 0.
19/02/05 01:11:42 INFO BarrierTaskContext: Task 3 from Stage 0(Attempt 0)
waiting under the global sync since 1549328862443, has been waiting for 240
seconds, current barrier epoch is 0.
19/02/05 01:11:42 INFO BarrierTaskContext: Task 1 from Stage 0(Attempt 0)
waiting under the global sync since 1549328862522, has been waiting for 240
seconds, current barrier epoch is 0.
```
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]