Re: [DatasourceV2] Default Mode for DataFrameWriter not Dependent on DataSource Version

Burak Yavuz Wed, 20 May 2020 15:01:11 -0700

Hey Russell,

Great catch on the documentation. It seems out of date. I honestly am
against having different DataSources having different default SaveModes.
Users will have no clue if a DataSource implementation is V1 or V2. It
seems weird that the default value can change for something that I have no
clue about. Especially for connectors like the Cassandra Connector or Delta
Lake, where they have been V1 DataSources for a long time, and may continue
to have both code paths for a while, this would cause even more confusion.


What is a problem you're having right now that you would prefer different
defaults?

Best,
Burak

On Wed, May 20, 2020 at 2:50 PM Russell Spitzer <russell.spit...@gmail.com>
wrote:

>
> While the ScalaDocs for DataFrameWriter say
>
> /**
>  * Specifies the behavior when data or table already exists. Options include:
>  * <ul>
>  * <li>`SaveMode.Overwrite`: overwrite the existing data.</li>
>  * <li>`SaveMode.Append`: append the data.</li>
>  * <li>`SaveMode.Ignore`: ignore the operation (i.e. no-op).</li>
>  * <li>`SaveMode.ErrorIfExists`: throw an exception at runtime.</li>
>  * </ul>
>  * <p>
>  * When writing to data source v1, the default option is `ErrorIfExists`. 
> When writing to data
>  * source v2, the default option is `Append`.
>  *
>  * @since 1.4.0
>  */
>
>
> As far as I can tell, using DataFrame writer with a TableProviding
> DataSource V2 will still default to ErrorIfExists which breaks existing
> code since DSV2 cannot support ErrorIfExists mode. I noticed in the history
> of DataframeWriter there were versions which differentiated between DSV2
> and DSV1 and set the mode accordingly but this seems to no longer be the
> case. Was this intentional? I feel like if we could
> have the default be based on the Source then upgrading code from DSV1 ->
> DSV2 would be much easier for users.
>
> I'm currently testing this on RC2
>
>
> Any thoughts?
>
> Thanks for your time as usual,
> Russ
>

Re: [DatasourceV2] Default Mode for DataFrameWriter not Dependent on DataSource Version

Reply via email to