[
https://issues.apache.org/jira/browse/SPARK-5395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Josh Rosen resolved SPARK-5395.
-------------------------------
Resolution: Fixed
Fix Version/s: 1.2.2
I've merged this into `branch-1.2` (1.2.2), so I'm marking this as fixed.
> Large number of Python workers causing resource depletion
> ---------------------------------------------------------
>
> Key: SPARK-5395
> URL: https://issues.apache.org/jira/browse/SPARK-5395
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 1.2.0, 1.3.0
> Environment: AWS ElasticMapReduce
> Reporter: Sven Krasser
> Assignee: Davies Liu
> Fix For: 1.3.0, 1.2.2
>
>
> During job execution a large number of Python worker accumulates eventually
> causing YARN to kill containers for being over their memory allocation (in
> the case below that is about 8G for executors plus 6G for overhead per
> container).
> In this instance, at the time of killing the container 97 pyspark.daemon
> processes had accumulated.
> {noformat}
> 2015-01-23 15:36:53,654 INFO [Reporter] yarn.YarnAllocationHandler
> (Logging.scala:logInfo(59)) - Container marked as failed:
> container_1421692415636_0052_01_000030. Exit status: 143. Diagnostics:
> Container [pid=35211,containerID=container_1421692415636_0052_01_000030] is
> running beyond physical memory limits. Current usage: 14.9 GB of 14.5 GB
> physical memory used; 41.3 GB of 72.5 GB virtual memory used. Killing
> container.
> Dump of the process-tree for container_1421692415636_0052_01_000030 :
> |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS)
> VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
> |- 54101 36625 36625 35211 (python) 78 1 332730368 16834 python -m
> pyspark.daemon
> |- 52140 36625 36625 35211 (python) 58 1 332730368 16837 python -m
> pyspark.daemon
> |- 36625 35228 36625 35211 (python) 65 604 331685888 17694 python -m
> pyspark.daemon
> [...]
> {noformat}
> The configuration used uses 64 containers with 2 cores each.
> Full output here: https://gist.github.com/skrasser/e3e2ee8dede5ef6b082c
> Mailinglist discussion:
> https://www.mail-archive.com/[email protected]/msg20102.html
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]