[jira] [Created] (SPARK-5395) Large number of Python workers causing resource depletion

Sven Krasser (JIRA) Sat, 24 Jan 2015 00:21:08 -0800

Sven Krasser created SPARK-5395:
-----------------------------------

             Summary: Large number of Python workers causing resource depletion
                 Key: SPARK-5395
                 URL: https://issues.apache.org/jira/browse/SPARK-5395
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 1.2.0
         Environment: AWS ElasticMapReduce
            Reporter: Sven Krasser



During job execution a large number of Python worker accumulates eventually 
causing YARN to kill containers for being over their memory allocation (in the 
case below that is about 8G for executors plus 6G for overhead per container). 

In this instance, at the time of killing the container 97 pyspark.daemon 
processes had accumulated.

{noformat}
2015-01-23 15:36:53,654 INFO [Reporter] yarn.YarnAllocationHandler 
(Logging.scala:logInfo(59)) - Container marked as failed: 
container_1421692415636_0052_01_000030. Exit status: 143. Diagnostics: 
Container [pid=35211,containerID=container_1421692415636_0052_01_000030] is 
running beyond physical memory limits. Current usage: 14.9 GB of 14.5 GB 
physical memory used; 41.3 GB of 72.5 GB virtual memory used. Killing container.
Dump of the process-tree for container_1421692415636_0052_01_000030 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) 
VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 54101 36625 36625 35211 (python) 78 1 332730368 16834 python -m 
pyspark.daemon
|- 52140 36625 36625 35211 (python) 58 1 332730368 16837 python -m 
pyspark.daemon
|- 36625 35228 36625 35211 (python) 65 604 331685888 17694 python -m 
pyspark.daemon
        [...]
{noformat}

Full output here: https://gist.github.com/skrasser/e3e2ee8dede5ef6b082c

Mailinglist discussion: 
https://www.mail-archive.com/[email protected]/msg20102.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SPARK-5395) Large number of Python workers causing resource depletion

Reply via email to