[ 
https://issues.apache.org/jira/browse/SPARK-57223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated SPARK-57223:
------------------------------
    Description: 
*Symptom*

Error messages involving a column or struct field whose name contains a dot or 
space display the identifier incorrectly. For example, for a column named 
{{job.title}}:

{noformat}
Fail to assign a value of "INT" type to the "SMALLINT" type column or variable 
`job`.`title`
{noformat}

Expected:
{noformat}
Fail to assign a value of "INT" type to the "SMALLINT" type column or variable 
`job.title`
{noformat}

*Root Cause*

{{DataTypeErrorsBase.toSQLId(parts: String)}} passes its input through 
{{AttributeNameParser.parseAttributeName}}, which interprets the string as SQL 
text and splits on unquoted dots:

{code:scala}
def toSQLId(parts: String): String =
  toSQLId(AttributeNameParser.parseAttributeName(parts))  // "job.title" -> 
Seq("job","title")
{code}

The String overload is intended for user-typed SQL identifier text. However, 
many call sites pass verbatim stored names from the internal object model -- 
{{toSQLId(col.name)}}, {{toSQLId(attr.name)}}, {{toSQLId(f.name)}}, etc. -- 
where the string is a raw single-part name that may contain a dot or space.

*Fix:* change the String overload body to {{toSQLId(Seq(parts))}}. All 
verbatim-name call sites are fixed automatically.

*Pre-existing Misuse of toSQLId(String) (E1-E6)*

Six call sites pass already-formatted SQL strings and currently work by 
accident. They must be corrected before the overload change, or addressed as a 
follow-up. All are in error-message-only paths -- no execution risk.

|| # || Location || Argument || Why it is wrong || Correct fix ||
| E1 | CheckAnalysis.scala:441,446,1213,1221,1236,1239,1242,1246,1251,1266 | 
table.name / r.name | ResolvedTable.name is already a pre-formatted string | 
Drop toSQLId; use table.name directly |
| E2 | CheckAnalysis.scala:875,883 | column.quoted | Array[String] pre-joined 
to a formatted string before passing | Store as column.toImmutableArraySeq so 
toSQLId uses the Seq overload |
| E3 | CheckAnalysis.scala:~1227 | col.name.quoted | Seq[String] pre-formatted; 
fieldName also used as bare String at line 1277 (compilation coupling) | Use 
col.name (Seq[String]) for toSQLId calls; keep col.name.quoted at line 1277 |
| E4 | QueryCompilationErrors.scala:728,1318 | FunctionIdentifier.toString | 
Unquoted but dot-joined; mis-splits if name contains a dot | Use x.nameParts |
| E5 | QueryCompilationErrors.scala:2982,2994 | TableIdentifier.toString | 
Fully backtick-quoted (inherits quotedString); would double-quote | Use 
tableIdent.nameParts |
| E6 | JDBCTable.scala:84,107 | ident.toString | Conditionally-quoted 
dot-joined multi-part JDBC name | Use toSQLId(ident.namespace().toSeq :+ 
ident.name()) |

> toSQLId(String) misrepresents column/field names containing dots or spaces in 
> error messages
> --------------------------------------------------------------------------------------------
>
>                 Key: SPARK-57223
>                 URL: https://issues.apache.org/jira/browse/SPARK-57223
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 5.0.0
>            Reporter: Eric Yang
>            Priority: Major
>
> *Symptom*
> Error messages involving a column or struct field whose name contains a dot 
> or space display the identifier incorrectly. For example, for a column named 
> {{job.title}}:
> {noformat}
> Fail to assign a value of "INT" type to the "SMALLINT" type column or 
> variable `job`.`title`
> {noformat}
> Expected:
> {noformat}
> Fail to assign a value of "INT" type to the "SMALLINT" type column or 
> variable `job.title`
> {noformat}
> *Root Cause*
> {{DataTypeErrorsBase.toSQLId(parts: String)}} passes its input through 
> {{AttributeNameParser.parseAttributeName}}, which interprets the string as 
> SQL text and splits on unquoted dots:
> {code:scala}
> def toSQLId(parts: String): String =
>   toSQLId(AttributeNameParser.parseAttributeName(parts))  // "job.title" -> 
> Seq("job","title")
> {code}
> The String overload is intended for user-typed SQL identifier text. However, 
> many call sites pass verbatim stored names from the internal object model -- 
> {{toSQLId(col.name)}}, {{toSQLId(attr.name)}}, {{toSQLId(f.name)}}, etc. -- 
> where the string is a raw single-part name that may contain a dot or space.
> *Fix:* change the String overload body to {{toSQLId(Seq(parts))}}. All 
> verbatim-name call sites are fixed automatically.
> *Pre-existing Misuse of toSQLId(String) (E1-E6)*
> Six call sites pass already-formatted SQL strings and currently work by 
> accident. They must be corrected before the overload change, or addressed as 
> a follow-up. All are in error-message-only paths -- no execution risk.
> || # || Location || Argument || Why it is wrong || Correct fix ||
> | E1 | CheckAnalysis.scala:441,446,1213,1221,1236,1239,1242,1246,1251,1266 | 
> table.name / r.name | ResolvedTable.name is already a pre-formatted string | 
> Drop toSQLId; use table.name directly |
> | E2 | CheckAnalysis.scala:875,883 | column.quoted | Array[String] pre-joined 
> to a formatted string before passing | Store as column.toImmutableArraySeq so 
> toSQLId uses the Seq overload |
> | E3 | CheckAnalysis.scala:~1227 | col.name.quoted | Seq[String] 
> pre-formatted; fieldName also used as bare String at line 1277 (compilation 
> coupling) | Use col.name (Seq[String]) for toSQLId calls; keep 
> col.name.quoted at line 1277 |
> | E4 | QueryCompilationErrors.scala:728,1318 | FunctionIdentifier.toString | 
> Unquoted but dot-joined; mis-splits if name contains a dot | Use x.nameParts |
> | E5 | QueryCompilationErrors.scala:2982,2994 | TableIdentifier.toString | 
> Fully backtick-quoted (inherits quotedString); would double-quote | Use 
> tableIdent.nameParts |
> | E6 | JDBCTable.scala:84,107 | ident.toString | Conditionally-quoted 
> dot-joined multi-part JDBC name | Use toSQLId(ident.namespace().toSeq :+ 
> ident.name()) |



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to