Armand BERGES created SPARK-36858: ------------------------------------- Summary: Spark API to apply same function to multiple columns Key: SPARK-36858 URL: https://issues.apache.org/jira/browse/SPARK-36858 Project: Spark Issue Type: New Feature Components: Spark Core Affects Versions: 3.1.2, 2.4.8 Reporter: Armand BERGES
Hi My team and I have regularly need to apply the same function to multiple columns at once. For example, we want to remove all non alphanumerical characters to each columns of our dataframes. When we hit this use case first, some people in my team were using this kind of code : {code:java} val colListToClean = .... ## Generate some list, could be very long. val dfToClean: DataFrame = ... ## This is the dataframe we want to clean def cleanFunction(colName: String): Column = ... ## Write some function to manipulate column based on its name. val dfCleaned = colListToClean.foldLeft(dfToClean)((df, colName) => df.withColumn(colName, cleanFunction(colName)){code} This kind of code when applied on a large set of columns overloaded our driver (because a Dataframe is generated for each column to clean). Based on this issue, we developed some code to add two functions : * One to apply the same function to multiple columns * One to rename multiple columns based on a Map. I wonder if your ever ask your team to add such kind of API ? If you did, had you any kind of issue regarding the implementation ? If you didn't, is this any idea you could add to Spark ? Best regards, LvffY -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org