Re: Pyspark - How to add new column to dataframe based on existing column value

Viktor ARDELEAN Wed, 10 Feb 2016 04:50:35 -0800

I figured it out.
Here is how it's done:

from pyspark.sql.functions import udf
replaceFunction = udf(lambda columnValue : columnValue.replace("\n", "
").replace('\r', " "))


df.withColumn('strReplaced', replaceFunction(df["str"]))


On 10 February 2016 at 13:04, <ndj...@gmail.com> wrote:

> Hi Viktor,
>
> Try to create a UDF. It's quite simple!
>
> Ardo.
>
>
> On 10 Feb 2016, at 10:34, Viktor ARDELEAN <viktor0...@gmail.com> wrote:
>
> Hello,
>
> I want to add a new String column to the dataframe based on an existing
> column values:
>
> from pyspark.sql.functions import lit
>
> df.withColumn('strReplaced', lit(df.str.replace("a", "b").replace("c", "d")))
>
> So basically I want to add a new column named "strReplaced", that is the same 
> as the "str" column, just with character "a" replaced with "b" and "c" 
> replaced with "d".
>
> When I try the code above I get following error:
>
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> AttributeError: 'Column' object has no attribute 'replace'
>
>
> So in fact I need somehow to get the value of the column df.str in order to 
> call replace on it.
>
> Any ideas how to do this?
> --
> Viktor ARDELEAN
>
> *P*   Don't print this email, unless it's really necessary. Take care of
> the environment.
>
>


-- 
Viktor ARDELEAN

*P*   Don't print this email, unless it's really necessary. Take care of
the environment.

Re: Pyspark - How to add new column to dataframe based on existing column value

Reply via email to