[ 
https://issues.apache.org/jira/browse/SPARK-54595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated SPARK-54595:
------------------------------
    Description: 
As [~aokolnychyi] tested this feature, he mentioned that as of Spark 4.1 the 
behavior is changed for MERGE INTO with UPDATE * , but without SCHEMA EVOLUTION 
clause.  

In particular:
* Source has less columns/nested fields than target => we fill with NULL or 
DEFAULT for inserts, and existing value for Update.  (though we disabled for 
nested structs by default in SPARK-54525)
* Source has more columns/fields than target => we drop the extra fields.

Initially, I thought its a good improvement of MERGE INTO and is not related to 
SCHEMA EVOLUTION exactly because the schema is not altered.  But Anton has a 
good point that it may be a surprise to some user.  So it may be better for now 
to be more conservative and keep the exact same behavior for without SCHEMA 
EVOLUTION clause, and relax it later once there is more clarity.  Instead, we 
can do this only if SCHEMA EVOLUTION is specified, as the user then is more 
explicit about the decision.

  was:
As [~aokolnychyi] tested this feature, he mentioned that as of Spark 4.1 the 
behavior is changed for MERGE INTO with UPDATE * , but without SCHEMA EVOLUTION 
clause.  

In particular:
* Source has less columns/nested fields than target => we fill with NULL or 
DEFAULT for inserts, and existing value for Update.  (though we disabled for 
nested structs by default in SPARK-54525)
* Source has more columns/fields than target => we drop the extra fields.

Initially, I thought its a good improvement of MERGE INTO, but Anton has a good 
point that it may be a surprise to some user.  So it may be better for now to 
be more conservative and keep the exact same behavior for without SCHEMA 
EVOLUTION clause, and relax it later once there is more clarity.


> Keep existing behavior without SCHEMA EVOLUTION clause
> ------------------------------------------------------
>
>                 Key: SPARK-54595
>                 URL: https://issues.apache.org/jira/browse/SPARK-54595
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 4.1.0
>            Reporter: Szehon Ho
>            Priority: Major
>
> As [~aokolnychyi] tested this feature, he mentioned that as of Spark 4.1 the 
> behavior is changed for MERGE INTO with UPDATE * , but without SCHEMA 
> EVOLUTION clause.  
> In particular:
> * Source has less columns/nested fields than target => we fill with NULL or 
> DEFAULT for inserts, and existing value for Update.  (though we disabled for 
> nested structs by default in SPARK-54525)
> * Source has more columns/fields than target => we drop the extra fields.
> Initially, I thought its a good improvement of MERGE INTO and is not related 
> to SCHEMA EVOLUTION exactly because the schema is not altered.  But Anton has 
> a good point that it may be a surprise to some user.  So it may be better for 
> now to be more conservative and keep the exact same behavior for without 
> SCHEMA EVOLUTION clause, and relax it later once there is more clarity.  
> Instead, we can do this only if SCHEMA EVOLUTION is specified, as the user 
> then is more explicit about the decision.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to