[ https://issues.apache.org/jira/browse/SPARK-38193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon updated SPARK-38193: --------------------------------- Component/s: SQL (was: Spark Core) > [Spark Core] [Feature] change of unionByName parameter > ------------------------------------------------------ > > Key: SPARK-38193 > URL: https://issues.apache.org/jira/browse/SPARK-38193 > Project: Spark > Issue Type: New Feature > Components: SQL > Affects Versions: 3.2.1 > Reporter: Daniel Davies > Priority: Minor > > Hello, > I had a quick question about the unionByName function. This function > currently seems to accept a parameter- "allowMissingColumns"- that allows > some tolerance to merging datasets with different schemas > [here|[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L2170]]; > but the implementation is currently a bit restrictive, i.e., with the second > parameter being a boolean, it is only possible to make unionByName add all > columns from both dataframes at the moment. We have other use cases in our > workflows- for example, to take only column names that are in both dataframes > (and I'm assuming that other users will have different merge strategies in > mind also). Does it seem reasonable to extend the parameter from > "allowMissingColumns" to a "mode" string-type parameter natively in Spark? If > so, I'm happy to make a PR to achieve this (the change would involve amending > the > [ResolveUnion.scala|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveUnion.scala] > utility to make it more flexible in merging columns; to a user it would look > a lot more like the 'join' operator, where a join strategy is selected). > I've posted this question on the dev mailing list also; happy to continue the > conversation there if that is preferable. > -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org