[
https://issues.apache.org/jira/browse/SPARK-54595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Szehon Ho updated SPARK-54595:
------------------------------
Description:
As [~aokolnychyi] tested this feature, he mentioned that as of Spark 4.1 the
behavior is changed for MERGE INTO with UPDATE * , but without SCHEMA EVOLUTION
clause.
In particular:
* Source has less columns/nested fields than target => we fill with NULL or
DEFAULT for inserts, and existing value for Update. (though we disabled for
nested structs by default in SPARK-54525)
* Source has more columns/fields than target => we drop the extra fields.
Initially, I thought its a good improvement of MERGE INTO and is not related to
SCHEMA EVOLUTION exactly because the schema is not altered. But Anton has a
good point that it may be a surprise to some user. So it may be better for now
to be more conservative and keep the exact same behavior for without SCHEMA
EVOLUTION clause, and relax it later once there is more clarity. Instead, we
can do this only if SCHEMA EVOLUTION is specified, as the user then is more
explicit about the decision.
was:
As [~aokolnychyi] tested this feature, he mentioned that as of Spark 4.1 the
behavior is changed for MERGE INTO with UPDATE * , but without SCHEMA EVOLUTION
clause.
In particular:
* Source has less columns/nested fields than target => we fill with NULL or
DEFAULT for inserts, and existing value for Update. (though we disabled for
nested structs by default in SPARK-54525)
* Source has more columns/fields than target => we drop the extra fields.
Initially, I thought its a good improvement of MERGE INTO, but Anton has a good
point that it may be a surprise to some user. So it may be better for now to
be more conservative and keep the exact same behavior for without SCHEMA
EVOLUTION clause, and relax it later once there is more clarity.
> Keep existing behavior without SCHEMA EVOLUTION clause
> ------------------------------------------------------
>
> Key: SPARK-54595
> URL: https://issues.apache.org/jira/browse/SPARK-54595
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Affects Versions: 4.1.0
> Reporter: Szehon Ho
> Priority: Major
>
> As [~aokolnychyi] tested this feature, he mentioned that as of Spark 4.1 the
> behavior is changed for MERGE INTO with UPDATE * , but without SCHEMA
> EVOLUTION clause.
> In particular:
> * Source has less columns/nested fields than target => we fill with NULL or
> DEFAULT for inserts, and existing value for Update. (though we disabled for
> nested structs by default in SPARK-54525)
> * Source has more columns/fields than target => we drop the extra fields.
> Initially, I thought its a good improvement of MERGE INTO and is not related
> to SCHEMA EVOLUTION exactly because the schema is not altered. But Anton has
> a good point that it may be a surprise to some user. So it may be better for
> now to be more conservative and keep the exact same behavior for without
> SCHEMA EVOLUTION clause, and relax it later once there is more clarity.
> Instead, we can do this only if SCHEMA EVOLUTION is specified, as the user
> then is more explicit about the decision.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]