[
https://issues.apache.org/jira/browse/SPARK-3358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14243454#comment-14243454
]
Sean Owen commented on SPARK-3358:
----------------------------------
Is this then resolved by one of https://github.com/apache/spark/pull/2244 or
https://github.com/apache/spark/pull/2259 ?
> PySpark worker fork()ing performance regression in m3.* / PVM instances
> -----------------------------------------------------------------------
>
> Key: SPARK-3358
> URL: https://issues.apache.org/jira/browse/SPARK-3358
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 1.1.0
> Environment: m3.* instances on EC2
> Reporter: Josh Rosen
>
> SPARK-2764 (and some followup commits) simplified PySpark's worker process
> structure by removing an intermediate pool of processes forked by daemon.py.
> Previously, daemon.py forked a fixed-size pool of processes that shared a
> socket and handled worker launch requests from Java. After my patch, this
> intermediate pool was removed and launch requests are handled directly in
> daemon.py.
> Unfortunately, this seems to have increased PySpark task launch latency when
> running on m3* class instances in EC2. Most of this difference can be
> attributed to m3 instances' more expensive fork() system calls. I tried the
> following microbenchmark on m3.xlarge and r3.xlarge instances:
> {code}
> import os
> for x in range(1000):
> if os.fork() == 0:
> exit()
> {code}
> On the r3.xlarge instance:
> {code}
> real 0m0.761s
> user 0m0.008s
> sys 0m0.144s
> {code}
> And on m3.xlarge:
> {code}
> real 0m1.699s
> user 0m0.012s
> sys 0m1.008s
> {code}
> I think this is due to HVM vs PVM EC2 instances using different
> virtualization technologies with different fork costs.
> It may be the case that this performance difference only appears in certain
> microbenchmarks and is masked by other performance improvements in PySpark,
> such as improvements to large group-bys. I'm in the process of re-running
> spark-perf benchmarks on m3 instances in order to confirm whether this
> impacts more realistic jobs.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]