Re: how to create List in pyspark

2017-04-28 Thread Felix Cheung
Why no use sql functions explode and split? Would perform and be more stable then udf From: Yanbo Liang <yblia...@gmail.com> Sent: Thursday, April 27, 2017 7:34:54 AM To: Selvam Raman Cc: user Subject: Re: how to create List in pyspark ​You can try with UDF

Re: how to create List in pyspark

2017-04-27 Thread Yanbo Liang
​You can try with UDF, like the following code snippet: from pyspark.sql.functions import udf from pyspark.sql.types import ArrayType, StringType df = spark.read.text("./README.md")​ split_func = udf(lambda text: text.split(" "), ArrayType(StringType())) df.withColumn("split_value",

how to create List in pyspark

2017-04-24 Thread Selvam Raman
documentDF = spark.createDataFrame([ ("Hi I heard about Spark".split(" "), ), ("I wish Java could use case classes".split(" "), ), ("Logistic regression models are neat".split(" "), ) ], ["text"]) How can i achieve the same df while i am reading from source? doc =