Hey Russell, Great catch on the documentation. It seems out of date. I honestly am against having different DataSources having different default SaveModes. Users will have no clue if a DataSource implementation is V1 or V2. It seems weird that the default value can change for something that I have no clue about. Especially for connectors like the Cassandra Connector or Delta Lake, where they have been V1 DataSources for a long time, and may continue to have both code paths for a while, this would cause even more confusion.
What is a problem you're having right now that you would prefer different defaults? Best, Burak On Wed, May 20, 2020 at 2:50 PM Russell Spitzer <russell.spit...@gmail.com> wrote: > > While the ScalaDocs for DataFrameWriter say > > /** > * Specifies the behavior when data or table already exists. Options include: > * <ul> > * <li>`SaveMode.Overwrite`: overwrite the existing data.</li> > * <li>`SaveMode.Append`: append the data.</li> > * <li>`SaveMode.Ignore`: ignore the operation (i.e. no-op).</li> > * <li>`SaveMode.ErrorIfExists`: throw an exception at runtime.</li> > * </ul> > * <p> > * When writing to data source v1, the default option is `ErrorIfExists`. > When writing to data > * source v2, the default option is `Append`. > * > * @since 1.4.0 > */ > > > As far as I can tell, using DataFrame writer with a TableProviding > DataSource V2 will still default to ErrorIfExists which breaks existing > code since DSV2 cannot support ErrorIfExists mode. I noticed in the history > of DataframeWriter there were versions which differentiated between DSV2 > and DSV1 and set the mode accordingly but this seems to no longer be the > case. Was this intentional? I feel like if we could > have the default be based on the Source then upgrading code from DSV1 -> > DSV2 would be much easier for users. > > I'm currently testing this on RC2 > > > Any thoughts? > > Thanks for your time as usual, > Russ >