Spark deletes all existing partitions in SaveMode.Overwrite - Expected behavior ?

Yash Sharma Wed, 06 Jul 2016 19:10:26 -0700

Hi All,
While writing a partitioned data frame as partitioned text files I see that
Spark deletes all available partitions while writing few new partitions.


dataDF.write.partitionBy(“year”, “month”,
> “date”).mode(SaveMode.Overwrite).text(“s3://data/test2/events/”)


Is this an expected behavior ?

I have a past correction job which would overwrite couple of past
partitions based on new arriving data. I would only want to remove those
partitions.

Is there a neater way to do that other than:
- Find the partitions
- Delete using Hadoop API's
- Write DF in Append Mode


Cheers
Yash

Spark deletes all existing partitions in SaveMode.Overwrite - Expected behavior ?

Reply via email to