[ 
https://issues.apache.org/jira/browse/SPARK-19844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-19844.
----------------------------------
    Resolution: Incomplete

> UDF in when control function is executed before the when clause is evaluated.
> -----------------------------------------------------------------------------
>
>                 Key: SPARK-19844
>                 URL: https://issues.apache.org/jira/browse/SPARK-19844
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, SQL
>    Affects Versions: 2.0.1, 2.1.0
>            Reporter: Franklyn Dsouza
>            Priority: Major
>              Labels: bulk-closed
>
> Sometimes we try to filter out the argument to a udf using {code}when(clause, 
> udf).otherwise(default){code}
> but we've noticed that sometimes the udf is being run on data that shouldn't 
> have matched the clause.
> heres some code to reproduce the issue.
> {code}
> from pyspark.sql import functions as F
> from pyspark.sql import types
> df = sc.sql.createDataFrame([{'r': None}], 
> schema=types.StructType([types.StructField('r', types.StringType())]))
> simple_udf = F.udf(lambda ref: ref.strip("/"), types.StringType())
> df.withColumn('test', 
>                F.when(F.col("r").isNotNull(), simple_udf(F.col("r")))
>                 .otherwise(F.lit(None))
>              ).collect()
> {code}
> This causes an exception because the udf is running on null data. i get 
> AttributeError: 'NoneType' object has no attribute 'strip'. 
> so it looks like the udf is being evaluated before the clause in the when is 
> evaulated. Oddly enough when i change {code}F.col("r").isNotNull(){code} to 
> {code}df["r"] != None{code} then it works. 
> might be related to https://issues.apache.org/jira/browse/SPARK-13773
>  
> and https://issues.apache.org/jira/browse/SPARK-15282



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to