[ 
https://issues.apache.org/jira/browse/SPARK-57223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated SPARK-57223:
------------------------------
    Description: 
h3. Symptom

When a DataSource V2 write / INSERT / MERGE fails with an 
{{INCOMPATIBLE_DATA_FOR_TABLE}} error, a column or struct-field name that 
contains a dot is rendered as two back-quoted identifiers instead of one. For a 
column literally named {{{}a.b{}}}, the error shows {{`a`.`b`}} (which reads as 
field {{b}} nested in struct
{{{}a{}}}) instead of {{{}`a.b`{}}}.

Example:
{code:java}
spark.table("src").withColumnRenamed("data", "a.b").writeTo("cat.t").append()
// EXTRA_COLUMNS reports extraColumns = `a`.`b` (wrong)
// `a.b` (expected){code}
Affected error classes: {{{}INCOMPATIBLE_DATA_FOR_TABLE.EXTRA_COLUMNS{}}}, 
{{{}EXTRA_STRUCT_FIELDS{}}}, {{{}STRUCT_MISSING_FIELDS{}}}.
h3. Root cause

These messages quote the name via {{{}toSQLId(String){}}}, which parses it 
through {{AttributeNameParser}} and splits on dots – i.e. it treats a stored, 
verbatim name as SQL text.
h3. Fix (scope of this ticket)

Quote the verbatim name in the affected renders inside {{TableOutputResolver}} 
(the component that resolves a query's output columns against a target schema) 
by passing it as {{{}toSQLId(Seq(name)){}}}. The shared {{toSQLId}} helper is 
intentionally left unchanged: it has ~190 callers, many of which legitimately 
pass qualified, multi-part names and rely on the split, so changing it would 
risk wide regressions.
h3. Follow-ups if this is accepted

The same root cause affects single-name renders in other components / error 
classes (e.g. {{{}INSERT_COLUMN_ARITY_MISMATCH{}}}, {{{}UNPIVOT{}}}, 
struct-field / \{{ALTER COLUMN }}errors). Those are distinct user-facing 
symptoms and will be handled in separate follow-up tickets.

  was:
h3. Symptom

When a DataSource V2 write / INSERT / MERGE fails with an 
{{INCOMPATIBLE_DATA_FOR_TABLE}} error, a column or struct-field name that 
contains a dot is rendered as two back-quoted identifiers instead of one. For a 
column literally named {{{}a.b{}}}, the error shows {{`a`.`b`}} (which reads as 
field {{b}} nested in struct
{{{}a{}}}) instead of {{{}`a.b`{}}}.

Example:
{code:java}
spark.table("src").withColumnRenamed("data", "a.b").writeTo("cat.t").append()
// EXTRA_COLUMNS reports extraColumns = `a`.`b` (wrong)
// `a.b` (expected){code}
Affected error classes: {{{}INCOMPATIBLE_DATA_FOR_TABLE.EXTRA_COLUMNS{}}}, 
{{{}EXTRA_STRUCT_FIELDS{}}}, {{{}STRUCT_MISSING_FIELDS{}}}.
h3. Root cause

These messages quote the name via {{{}toSQLId(String){}}}, which parses it 
through {{AttributeNameParser}} and splits on dots – i.e. it treats a stored, 
verbatim name as SQL text.
h3. Fix (scope of this ticket)

Quote the verbatim name in the affected renders inside {{TableOutputResolver}} 
(the component that resolves a query's output columns against a target schema) 
by passing it as {{{}toSQLId(Seq(name)){}}}. The shared {{toSQLId}} helper is 
intentionally left unchanged: it has ~190 callers, many of which legitimately 
pass qualified, multi-part names and rely on the split, so changing it would 
risk wide regressions.
h3. Follow-ups (if this is accepted)

The same root cause affects single-name renders in other components / error 
classes (e.g. {{{}INSERT_COLUMN_ARITY_MISMATCH{}}}, {{{}UNPIVOT{}}}, 
struct-field / {{ALTER COLUMN }}errors). Those are distinct user-facing 
symptoms and will be handled in separate follow-up tickets.


> INCOMPATIBLE_DATA_FOR_TABLE errors mis-quote a column/field name that 
> contains a dot
> ------------------------------------------------------------------------------------
>
>                 Key: SPARK-57223
>                 URL: https://issues.apache.org/jira/browse/SPARK-57223
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 5.0.0
>            Reporter: Eric Yang
>            Priority: Major
>              Labels: pull-request-available
>
> h3. Symptom
> When a DataSource V2 write / INSERT / MERGE fails with an 
> {{INCOMPATIBLE_DATA_FOR_TABLE}} error, a column or struct-field name that 
> contains a dot is rendered as two back-quoted identifiers instead of one. For 
> a column literally named {{{}a.b{}}}, the error shows {{`a`.`b`}} (which 
> reads as field {{b}} nested in struct
> {{{}a{}}}) instead of {{{}`a.b`{}}}.
> Example:
> {code:java}
> spark.table("src").withColumnRenamed("data", "a.b").writeTo("cat.t").append()
> // EXTRA_COLUMNS reports extraColumns = `a`.`b` (wrong)
> // `a.b` (expected){code}
> Affected error classes: {{{}INCOMPATIBLE_DATA_FOR_TABLE.EXTRA_COLUMNS{}}}, 
> {{{}EXTRA_STRUCT_FIELDS{}}}, {{{}STRUCT_MISSING_FIELDS{}}}.
> h3. Root cause
> These messages quote the name via {{{}toSQLId(String){}}}, which parses it 
> through {{AttributeNameParser}} and splits on dots – i.e. it treats a stored, 
> verbatim name as SQL text.
> h3. Fix (scope of this ticket)
> Quote the verbatim name in the affected renders inside 
> {{TableOutputResolver}} (the component that resolves a query's output columns 
> against a target schema) by passing it as {{{}toSQLId(Seq(name)){}}}. The 
> shared {{toSQLId}} helper is intentionally left unchanged: it has ~190 
> callers, many of which legitimately pass qualified, multi-part names and rely 
> on the split, so changing it would risk wide regressions.
> h3. Follow-ups if this is accepted
> The same root cause affects single-name renders in other components / error 
> classes (e.g. {{{}INSERT_COLUMN_ARITY_MISMATCH{}}}, {{{}UNPIVOT{}}}, 
> struct-field / \{{ALTER COLUMN }}errors). Those are distinct user-facing 
> symptoms and will be handled in separate follow-up tickets.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to