[
https://issues.apache.org/jira/browse/SPARK-9096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gisle Ytrestøl updated SPARK-9096:
----------------------------------
Description:
When using JavaRDD.subtract(), it seems that the tasks are unevenly distributed
in the the following operations on the new JavaRDD which is created by
"subtract". The result is that in the following operation on the new JavaRDD, a
few tasks process almost all the data, and these tasks will take a long time to
finish.
I've reproduced this bug in the attached Java file, which I submit with
spark-submit.
The logs for 1.3.1 and 1.4.1 are attached. In 1.4.1, we see that a few tasks in
the count job takes a lot of time:
15/07/16 09:13:17 INFO TaskSetManager: Finished task 1459.0 in stage 2.0 (TID
4659) in 708 ms on 148.251.190.217 (1597/1600)
15/07/16 09:13:17 INFO TaskSetManager: Finished task 1586.0 in stage 2.0 (TID
4786) in 772 ms on 148.251.190.217 (1598/1600)
15/07/16 09:17:51 INFO TaskSetManager: Finished task 1382.0 in stage 2.0 (TID
4582) in 275019 ms on 148.251.190.217 (1599/1600)
15/07/16 09:20:02 INFO TaskSetManager: Finished task 1230.0 in stage 2.0 (TID
4430) in 407020 ms on 148.251.190.217 (1600/1600)
15/07/16 09:20:02 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have
all completed, from pool
15/07/16 09:20:02 INFO DAGScheduler: ResultStage 2 (count at
ReproduceBug.java:56) finished in 420.024 s
15/07/16 09:20:02 INFO DAGScheduler: Job 0 finished: count at
ReproduceBug.java:56, took 442.941395 s
In comparison, all tasks are more or less equal in size when running the same
application in Spark 1.3.1. In overall, this
attached application (ReproduceBug.java) takes about 7 minutes on Spark 1.4.1,
and completes in roughly 30 seconds in Spark 1.3.1.
Spark 1.4.0 behaves similar to Spark 1.4.1 wrt this issue.
was:
When using JavaRDD.subtract(), it seems that the tasks are unevenly distributed
in the the following operations on the new JavaRDD which is created by
"subtract". The result is that in the following operation on the new JavaRDD, a
few tasks process almost all the data, and these tasks will take a long time to
finish.
> Unevenly distributed task loads after using JavaRDD.subtract()
> --------------------------------------------------------------
>
> Key: SPARK-9096
> URL: https://issues.apache.org/jira/browse/SPARK-9096
> Project: Spark
> Issue Type: Bug
> Components: Java API
> Affects Versions: 1.4.0, 1.4.1
> Reporter: Gisle Ytrestøl
> Attachments: ReproduceBug.java, reproduce.1.3.1.log.gz,
> reproduce.1.4.1.log.gz
>
>
> When using JavaRDD.subtract(), it seems that the tasks are unevenly
> distributed in the the following operations on the new JavaRDD which is
> created by "subtract". The result is that in the following operation on the
> new JavaRDD, a few tasks process almost all the data, and these tasks will
> take a long time to finish.
> I've reproduced this bug in the attached Java file, which I submit with
> spark-submit.
> The logs for 1.3.1 and 1.4.1 are attached. In 1.4.1, we see that a few tasks
> in the count job takes a lot of time:
> 15/07/16 09:13:17 INFO TaskSetManager: Finished task 1459.0 in stage 2.0 (TID
> 4659) in 708 ms on 148.251.190.217 (1597/1600)
> 15/07/16 09:13:17 INFO TaskSetManager: Finished task 1586.0 in stage 2.0 (TID
> 4786) in 772 ms on 148.251.190.217 (1598/1600)
> 15/07/16 09:17:51 INFO TaskSetManager: Finished task 1382.0 in stage 2.0 (TID
> 4582) in 275019 ms on 148.251.190.217 (1599/1600)
> 15/07/16 09:20:02 INFO TaskSetManager: Finished task 1230.0 in stage 2.0 (TID
> 4430) in 407020 ms on 148.251.190.217 (1600/1600)
> 15/07/16 09:20:02 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks
> have all completed, from pool
> 15/07/16 09:20:02 INFO DAGScheduler: ResultStage 2 (count at
> ReproduceBug.java:56) finished in 420.024 s
> 15/07/16 09:20:02 INFO DAGScheduler: Job 0 finished: count at
> ReproduceBug.java:56, took 442.941395 s
> In comparison, all tasks are more or less equal in size when running the same
> application in Spark 1.3.1. In overall, this
> attached application (ReproduceBug.java) takes about 7 minutes on Spark
> 1.4.1, and completes in roughly 30 seconds in Spark 1.3.1.
> Spark 1.4.0 behaves similar to Spark 1.4.1 wrt this issue.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]