Spark doesn't remove intermediate shuffle files if they're part of the same
job.

On Mon, Dec 18, 2017 at 3:10 PM, Mihai Iacob <mia...@ca.ibm.com> wrote:

> This code generates files under /tmp...blockmgr... which do not get
> cleaned up after the job finishes.
>
> Anything wrong with the code below? or are there any known issues with
> spark not cleaning up /tmp files?
>
>
> window = Window.\
>               partitionBy('***', 'date_str').\
>               orderBy(sqlDf['***'])
>
> sqlDf = sqlDf.withColumn("***",rank().over(window))
> df_w_least = sqlDf.filter("***=1")
>
>
>
>
>
> Regards,
>
> *Mihai Iacob*
> DSX Local <https://datascience.ibm.com/local> - Security, IBM Analytics
>
> --------------------------------------------------------------------- To
> unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to