Re: flatMap for dataframe
One way is to split->explode->pivot These are column and Dataframe methods. Here are quick examples from web: https://www.google.com/amp/s/sparkbyexamples.com/spark/spark-split-dataframe-column-into-multiple-columns/amp/ https://www.google.com/amp/s/sparkbyexamples.com/spark/explode-spark-array-and-map-dataframe-column/amp/ On Wed, 9 Feb 2022, 01:55 frakass, wrote: > Hello > > for the RDD I can apply flatMap method: > > >>> sc.parallelize(["a few words","ba na ba na"]).flatMap(lambda x: > x.split(" ")).collect() > ['a', 'few', 'words', 'ba', 'na', 'ba', 'na'] > > > But for a dataframe table how can I flatMap that as above? > > >>> df.show() > ++ > | value| > ++ > | a few lines| > |hello world here| > | ba na ba na| > ++ > > > Thanks > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >
Re: flatMap for dataframe
Is this the scala syntax? Yes in scala I know how to do it by converting the df to a dataset. how for pyspark? Thanks On 2022/2/9 10:24, oliver dd wrote: df.flatMap(row => row.getAs[String]("value").split(" ")) - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: flatMap for dataframe
Hi, You can achieve your goal by: df.flatMap(row => row.getAs[String]("value").split(" ")) — Best Regards, oliverdding
flatMap for dataframe
Hello for the RDD I can apply flatMap method: >>> sc.parallelize(["a few words","ba na ba na"]).flatMap(lambda x: x.split(" ")).collect() ['a', 'few', 'words', 'ba', 'na', 'ba', 'na'] But for a dataframe table how can I flatMap that as above? >>> df.show() ++ | value| ++ | a few lines| |hello world here| | ba na ba na| ++ Thanks - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Examples of flatMap in dataFrame
Hi You are looking for the explode method (in Dataframe API starting 1.3 I believe) https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala#L1002 Ram On Sun, Jun 7, 2015 at 9:22 PM, Dimp Bhat dimp201...@gmail.com wrote: Hi, I'm trying to write a custom transformer in Spark ML and since that uses DataFrames, am trying to use flatMap function in DataFrame class in Java. Can you share a simple example of how to use the flatMap function to do word count on single column of the DataFrame. Thanks Dimple
FlatMap in DataFrame
Hi, I'm trying to write a custom transformer in Spark ML and since that uses DataFrames, am trying to use flatMap function in DataFrame class in Java. Can you share a simple example of how to use the flatMap function to do word count on single column of the DataFrame. Thanks. Dimple -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/FlatMap-in-DataFrame-tp23199.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Examples of flatMap in dataFrame
Hi, I'm trying to write a custom transformer in Spark ML and since that uses DataFrames, am trying to use flatMap function in DataFrame class in Java. Can you share a simple example of how to use the flatMap function to do word count on single column of the DataFrame. Thanks Dimple