I figured it out. Here is how it's done: from pyspark.sql.functions import udf replaceFunction = udf(lambda columnValue : columnValue.replace("\n", " ").replace('\r', " "))
df.withColumn('strReplaced', replaceFunction(df["str"])) On 10 February 2016 at 13:04, <ndj...@gmail.com> wrote: > Hi Viktor, > > Try to create a UDF. It's quite simple! > > Ardo. > > > On 10 Feb 2016, at 10:34, Viktor ARDELEAN <viktor0...@gmail.com> wrote: > > Hello, > > I want to add a new String column to the dataframe based on an existing > column values: > > from pyspark.sql.functions import lit > > df.withColumn('strReplaced', lit(df.str.replace("a", "b").replace("c", "d"))) > > So basically I want to add a new column named "strReplaced", that is the same > as the "str" column, just with character "a" replaced with "b" and "c" > replaced with "d". > > When I try the code above I get following error: > > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > AttributeError: 'Column' object has no attribute 'replace' > > > So in fact I need somehow to get the value of the column df.str in order to > call replace on it. > > Any ideas how to do this? > -- > Viktor ARDELEAN > > *P* Don't print this email, unless it's really necessary. Take care of > the environment. > > -- Viktor ARDELEAN *P* Don't print this email, unless it's really necessary. Take care of the environment.