Re: PySpark Lost Executors

2015-11-19 Thread Ross.Cramblit
Thank you Ted and Sandy for getting me pointed in the right direction. From the logs: WARN yarn.YarnAllocator: Container killed by YARN for exceeding memory limits. 25.4 GB of 25.3 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead. On Nov 19, 2015, at 12:20 PM,

Re: PySpark Lost Executors

2015-11-19 Thread Ross.Cramblit
Hmm I guess I do not - I get 'application_1445957755572_0176 does not have any log files.’ Where can I enable log aggregation? On Nov 19, 2015, at 11:07 AM, Ted Yu > wrote: Do you have YARN log aggregation enabled ? You can try retrieving log for

Re: PySpark Lost Executors

2015-11-19 Thread Ted Yu
Do you have YARN log aggregation enabled ? You can try retrieving log for the container using the following command: yarn logs -applicationId application_1445957755572_0176 -containerId container_1445957755572_0176_01_03 Cheers On Thu, Nov 19, 2015 at 8:02 AM,

PySpark Lost Executors

2015-11-19 Thread Ross.Cramblit
I am running Spark 1.5.2 on Yarn. My job consists of a number of SparkSQL transforms on a JSON data set that I load into a data frame. The data set is not large (~100GB) and most stages execute without any issues. However, some more complex stages tend to lose executors/nodes regularly. What

Re: PySpark Lost Executors

2015-11-19 Thread Sandy Ryza
Hi Ross, This is most likely occurring because YARN is killing containers for exceeding physical memory limits. You can make this less likely to happen by bumping spark.yarn.executor.memoryOverhead to something higher than 10% of your spark.executor.memory. -Sandy On Thu, Nov 19, 2015 at 8:14

Re: PySpark Lost Executors

2015-11-19 Thread Ted Yu
Here are the parameters related to log aggregation : yarn.log-aggregation-enable true yarn.log-aggregation.retain-seconds 2592000 yarn.nodemanager.log-aggregation.compression-type gz