[
https://issues.apache.org/jira/browse/SPARK-55636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon reassigned SPARK-55636:
------------------------------------
Assignee: Pranav Dev
> Spark Connect deduplicate throws generic INTERNAL_ERROR instead of
> UNRESOLVED_COLUMN_AMONG_FIELD_NAMES for invalid column names
> -------------------------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-55636
> URL: https://issues.apache.org/jira/browse/SPARK-55636
> Project: Spark
> Issue Type: Bug
> Components: Connect
> Affects Versions: 4.2.0
> Reporter: Pranav Dev
> Assignee: Pranav Dev
> Priority: Major
> Labels: pull-request-available
> Original Estimate: 336h
> Remaining Estimate: 336h
>
> When using Spark Connect, calling `dropDuplicates` with a non-existent column
> name throws a generic `INTERNAL_ERROR` with SQLSTATE `XX000` instead of the
> more helpful `UNRESOLVED_COLUMN_AMONG_FIELD_NAMES` error that classic Spark
> throws.
>
> Example to reproduce:
> {code:java}
> # Create a sample DataFrame
> df1 = spark.createDataFrame([
> (1,"Song A","Artist A"),
> (2,"Song B","Artist B"),
> (3,"Song C","Artist C")
> ], ["id", "song_name", "artist_name"])
> df1.show()
> df1.printSchema()
> # Try to deduplicate on 'artist_id' which doesn't exist
> df1.dropDuplicates(["artist_id"]).show() {code}
>
> Current behavior (Spark Connect):
> {code:java}
> [INTERNAL_ERROR] Invalid deduplicate column artist_id SQLSTATE: XX000{code}
>
> Classic Spark:
> {code:java}
> Cannot resolve column name "artist_id" among (id, song_name,
> artist_name).{code}
>
> Expected behavior (Spark Connect):
> {code:java}
> [UNRESOLVED_COLUMN_AMONG_FIELD_NAMES] Cannot resolve column name "artist_id"
> among (id, song_name, artist_name). SQLSTATE: 42703{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]