Thanks Ryan for your response, let me spend more on Spark 3.0 overwrite behavior.
Best Regards Saisai Ryan Blue <[email protected]> 于2019年11月23日周六 上午1:08写道: > Saisai, > > Iceberg's behavior matches Hive's and Spark's behavior when using dynamic > overwrite mode. > > Spark does not specify the correct behavior -- it varies by source. In > addition, it isn't possible for a v2 source in 2.4 to implement the static > overwrite mode that is Spark's default. The problem is that the source is > not passed the static partition values, only rows. > > This is fixed in 3.0 because Spark will choose its behavior and correctly > configure the source with a dynamic overwrite or an overwrite using an > expression. > > On Thu, Nov 21, 2019 at 11:33 PM Saisai Shao <[email protected]> > wrote: > >> Hi Team, >> >> I found that Iceberg's "overwrite" is different from Spark's built-in >> sources like Parquet. The "overwrite" semantics in Iceberg seems more like >> "upsert", but not deleting the partitions where new data doesn't contain. >> >> I would like to know what is the purpose of such design choice? Also if I >> want to achieve Spark Parquet's "overwrite" semantics, how would I >> achieve this? >> >> Warning >> >> *Spark does not define the behavior of DataFrame overwrite*. Like most >> sources, Iceberg will dynamically overwrite partitions when the dataframe >> contains rows in a partition. Unpartitioned tables are completely >> overwritten. >> >> Best regards, >> Saisai >> > > > -- > Ryan Blue > Software Engineer > Netflix >
