[jira] [Comment Edited] (SPARK-7941) Cache Cleanup Failure when job is killed by Spark

Cory Nguyen (JIRA) Fri, 29 May 2015 02:29:34 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-7941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564459#comment-14564459
 ]


Cory Nguyen edited comment on SPARK-7941 at 5/29/15 9:28 AM:
-------------------------------------------------------------

I'm not entirely sure what you meant by "app data from the hadoop user, rather 
than you or yarn or spark user". "hadoop" is the standard user when Spark is 
deployed on AWS EMR. The "hadoop" user submits the spark jobs to yarn - I think 
that is why you may be confused by what you saw. However that appcache folder 
is spark related because only spark is ran on this cluster, I monitored the 
individual node as the job was running and was able to see the growing of the 
appcache due to the job being ran.

Yes, this is spark related. No, the containers are not still running. I know 
for certain the cache data is related to the spark job running. I thought YARN 
would clean this up too, but that was not the case, the data was still there 
hours later after the job was killed by spark/yarn.



was (Author: cqnguyen):
I'm not entirely sure what you meant by "app data from the hadoop user, rather 
than you or yarn or spark user". "hadoop" is the standard user when Spark is 
deployed on AWS EMR. The "hadoop" user submits the spark jobs to yarn - I think 
that is why you may be confused by what you saw. However that appcache folder 
is spark related because only spark is ran on this cluster, I monitored the 
individual node as the job was running and was about to see the growing of the 
appcache due to the job being ran.

Yes, this is spark related. No, the containers are not still running. I know 
for certain the cache data is related to the spark job running. I thought YARN 
would clean this up too, but that was not the case, the data was still there 
hours later after the job was killed by spark/yarn.


> Cache Cleanup Failure when job is killed by Spark 
> --------------------------------------------------
>
>                 Key: SPARK-7941
>                 URL: https://issues.apache.org/jira/browse/SPARK-7941
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, YARN
>    Affects Versions: 1.3.1
>            Reporter: Cory Nguyen
>         Attachments: screenshot-1.png
>
>
> Problem/Bug:
> If a job is running and Spark kills the job intentionally, the cache files 
> remains on the local/worker nodes and are not cleaned up properly. Over time 
> the old cache builds up and causes "No Space Left on Device" error. 
> The cache is cleaned up properly when the job succeeds. I have not verified 
> if the cached remains when the user intentionally kills the job. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (SPARK-7941) Cache Cleanup Failure when job is killed by Spark

Reply via email to