[ 
https://issues.apache.org/jira/browse/SPARK-38193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-38193:
---------------------------------
    Component/s: SQL
                     (was: Spark Core)

> [Spark Core] [Feature] change of unionByName parameter
> ------------------------------------------------------
>
>                 Key: SPARK-38193
>                 URL: https://issues.apache.org/jira/browse/SPARK-38193
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 3.2.1
>            Reporter: Daniel Davies
>            Priority: Minor
>
> Hello,
> I had a quick question about the unionByName function. This function 
> currently seems to accept a parameter- "allowMissingColumns"- that allows 
> some tolerance to merging datasets with different schemas 
> [here|[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L2170]];
>  but the implementation is currently a bit restrictive, i.e., with the second 
> parameter being a boolean, it is only possible to make unionByName add all 
> columns from both dataframes at the moment. We have other use cases in our 
> workflows- for example, to take only column names that are in both dataframes 
> (and I'm assuming that other users will have different merge strategies in 
> mind also). Does it seem reasonable to extend the parameter from 
> "allowMissingColumns" to a "mode" string-type parameter natively in Spark? If 
> so, I'm happy to make a PR to achieve this (the change would involve amending 
> the 
> [ResolveUnion.scala|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveUnion.scala]
>  utility to make it more flexible in merging columns; to a user it would look 
> a lot more like the 'join' operator, where a join strategy is selected). 
> I've posted this question on the dev mailing list also; happy to continue the 
> conversation there if that is preferable.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to