Hi Spark Team
Some of my spark applications on K8s ended with the below error . These
applications though completed successfully (as per the event log
SparkListenerApplicationEnd event at the end)
stil have even files with .inprogress. This causes the application to be
shown as inprogress in SHS.
How well Spark can scale up with your data (in terms of years of data)
depends on two things: the operations performed on the data, and
characteristics of the data, like value distributions.
Failing tasks smell like you are using operations that do not scale
(e.g. Cartesian product of your
If that is your loop unrolled, then you are not doing parts of work at a
time. That will execute all operations in one go when the write finally
happens. That's OK, but may be part of the problem. For example if you are
filtering for a subset, processing, and unioning, then that is just a
harder
Thanks for reply :-)
I am using pyspark. Basicially my code (simplified is):
df=spark.read.csv(hdfs://somehdfslocation)
df1=spark.sql (complex statement using df)
...
dfx=spark.sql(complex statement using df x-1)
...
dfx15.write()
What exactly is meant by "closing resources"? Is it just