[ 
https://issues.apache.org/jira/browse/SPARK-55636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-55636:
------------------------------------

    Assignee: Pranav Dev

> Spark Connect deduplicate throws generic INTERNAL_ERROR instead of 
> UNRESOLVED_COLUMN_AMONG_FIELD_NAMES for invalid column names
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-55636
>                 URL: https://issues.apache.org/jira/browse/SPARK-55636
>             Project: Spark
>          Issue Type: Bug
>          Components: Connect
>    Affects Versions: 4.2.0
>            Reporter: Pranav Dev
>            Assignee: Pranav Dev
>            Priority: Major
>              Labels: pull-request-available
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> When using Spark Connect, calling `dropDuplicates` with a non-existent column 
> name throws a generic `INTERNAL_ERROR` with SQLSTATE `XX000` instead of the 
> more helpful `UNRESOLVED_COLUMN_AMONG_FIELD_NAMES` error that classic Spark 
> throws.
>  
> Example to reproduce:
> {code:java}
> # Create a sample DataFrame
> df1 = spark.createDataFrame([
> (1,"Song A","Artist A"),
> (2,"Song B","Artist B"),
> (3,"Song C","Artist C")
> ], ["id", "song_name", "artist_name"])
> df1.show()
> df1.printSchema()
> # Try to deduplicate on 'artist_id' which doesn't exist
> df1.dropDuplicates(["artist_id"]).show() {code}
>  
> Current behavior (Spark Connect):
> {code:java}
> [INTERNAL_ERROR] Invalid deduplicate column artist_id SQLSTATE: XX000{code}
>  
> Classic Spark:
> {code:java}
> Cannot resolve column name "artist_id" among (id, song_name, 
> artist_name).{code}
>  
> Expected behavior (Spark Connect):
> {code:java}
> [UNRESOLVED_COLUMN_AMONG_FIELD_NAMES] Cannot resolve column name "artist_id" 
> among (id, song_name, artist_name). SQLSTATE: 42703{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to