Why no use sql functions explode and split?
Would perform and be more stable then udf
From: Yanbo Liang <yblia...@gmail.com>
Sent: Thursday, April 27, 2017 7:34:54 AM
To: Selvam Raman
Cc: user
Subject: Re: how to create List in pyspark
You can try with UDF
You can try with UDF, like the following code snippet:
from pyspark.sql.functions import udf
from pyspark.sql.types import ArrayType, StringType
df = spark.read.text("./README.md")
split_func = udf(lambda text: text.split(" "), ArrayType(StringType()))
df.withColumn("split_value",
documentDF = spark.createDataFrame([
("Hi I heard about Spark".split(" "), ),
("I wish Java could use case classes".split(" "), ),
("Logistic regression models are neat".split(" "), )
], ["text"])
How can i achieve the same df while i am reading from source?
doc =