Spark on K8s , some applications ended ungracefully

2022-03-31 Thread Pralabh Kumar
Hi Spark Team Some of my spark applications on K8s ended with the below error . These applications though completed successfully (as per the event log SparkListenerApplicationEnd event at the end) stil have even files with .inprogress. This causes the application to be shown as inprogress in SHS.

Re: loop of spark jobs leads to increase in memory on worker nodes and eventually faillure

2022-03-31 Thread Enrico Minack
How well Spark can scale up with your data (in terms of years of data) depends on two things: the operations performed on the data, and characteristics of the data, like value distributions. Failing tasks smell like you are using operations that do not scale (e.g. Cartesian product of your

Re: loop of spark jobs leads to increase in memory on worker nodes and eventually faillure

2022-03-31 Thread Sean Owen
If that is your loop unrolled, then you are not doing parts of work at a time. That will execute all operations in one go when the write finally happens. That's OK, but may be part of the problem. For example if you are filtering for a subset, processing, and unioning, then that is just a harder

Re: loop of spark jobs leads to increase in memory on worker nodes and eventually faillure

2022-03-31 Thread Joris Billen
Thanks for reply :-) I am using pyspark. Basicially my code (simplified is): df=spark.read.csv(hdfs://somehdfslocation) df1=spark.sql (complex statement using df) ... dfx=spark.sql(complex statement using df x-1) ... dfx15.write() What exactly is meant by "closing resources"? Is it just