[
https://issues.apache.org/jira/browse/SPARK-57223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Eric Yang updated SPARK-57223:
------------------------------
Description:
h3. Symptom
When a DataSource V2 write / INSERT / MERGE fails with an
{{INCOMPATIBLE_DATA_FOR_TABLE}} error, a column or struct-field name that
contains a dot is rendered as two back-quoted identifiers instead of one. For a
column literally named {{{}a.b{}}}, the error shows {{`a`.`b`}} (which reads as
field {{b}} nested in struct
{{{}a{}}}) instead of {{{}`a.b`{}}}.
Example:
{code:java}
spark.table("src").withColumnRenamed("data", "a.b").writeTo("cat.t").append()
// EXTRA_COLUMNS reports extraColumns = `a`.`b` (wrong)
// `a.b` (expected){code}
Affected error classes: {{{}INCOMPATIBLE_DATA_FOR_TABLE.EXTRA_COLUMNS{}}},
{{{}EXTRA_STRUCT_FIELDS{}}}, {{{}STRUCT_MISSING_FIELDS{}}}.
h3. Root cause
These messages quote the name via {{{}toSQLId(String){}}}, which parses it
through {{AttributeNameParser}} and splits on dots – i.e. it treats a stored,
verbatim name as SQL text.
h3. Fix (scope of this ticket)
Quote the verbatim name in the affected renders inside {{TableOutputResolver}}
(the component that resolves a query's output columns against a target schema)
by passing it as {{{}toSQLId(Seq(name)){}}}. The shared {{toSQLId}} helper is
intentionally left unchanged: it has ~190 callers, many of which legitimately
pass qualified, multi-part names and rely on the split, so changing it would
risk wide regressions.
h3. Follow-ups (if this is accepted)
The same root cause affects single-name renders in other components / error
classes (e.g. {{{}INSERT_COLUMN_ARITY_MISMATCH{}}}, {{{}UNPIVOT{}}},
struct-field / {{ALTER COLUMN }}errors). Those are distinct user-facing
symptoms and will be handled in separate follow-up tickets.
> INCOMPATIBLE_DATA_FOR_TABLE errors mis-quote a column/field name that
> contains a dot
> ------------------------------------------------------------------------------------
>
> Key: SPARK-57223
> URL: https://issues.apache.org/jira/browse/SPARK-57223
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 5.0.0
> Reporter: Eric Yang
> Priority: Major
> Labels: pull-request-available
>
> h3. Symptom
> When a DataSource V2 write / INSERT / MERGE fails with an
> {{INCOMPATIBLE_DATA_FOR_TABLE}} error, a column or struct-field name that
> contains a dot is rendered as two back-quoted identifiers instead of one. For
> a column literally named {{{}a.b{}}}, the error shows {{`a`.`b`}} (which
> reads as field {{b}} nested in struct
> {{{}a{}}}) instead of {{{}`a.b`{}}}.
> Example:
> {code:java}
> spark.table("src").withColumnRenamed("data", "a.b").writeTo("cat.t").append()
> // EXTRA_COLUMNS reports extraColumns = `a`.`b` (wrong)
> // `a.b` (expected){code}
> Affected error classes: {{{}INCOMPATIBLE_DATA_FOR_TABLE.EXTRA_COLUMNS{}}},
> {{{}EXTRA_STRUCT_FIELDS{}}}, {{{}STRUCT_MISSING_FIELDS{}}}.
> h3. Root cause
> These messages quote the name via {{{}toSQLId(String){}}}, which parses it
> through {{AttributeNameParser}} and splits on dots – i.e. it treats a stored,
> verbatim name as SQL text.
> h3. Fix (scope of this ticket)
> Quote the verbatim name in the affected renders inside
> {{TableOutputResolver}} (the component that resolves a query's output columns
> against a target schema) by passing it as {{{}toSQLId(Seq(name)){}}}. The
> shared {{toSQLId}} helper is intentionally left unchanged: it has ~190
> callers, many of which legitimately pass qualified, multi-part names and rely
> on the split, so changing it would risk wide regressions.
> h3. Follow-ups (if this is accepted)
> The same root cause affects single-name renders in other components / error
> classes (e.g. {{{}INSERT_COLUMN_ARITY_MISMATCH{}}}, {{{}UNPIVOT{}}},
> struct-field / {{ALTER COLUMN }}errors). Those are distinct user-facing
> symptoms and will be handled in separate follow-up tickets.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]