[jira] [Commented] (SPARK-22980) Using pandas_udf when inputs are not Pandas's Series or DataFrame

Hyukjin Kwon (JIRA) Sat, 13 Jan 2018 15:09:25 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-22980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16325371#comment-16325371
 ]


Hyukjin Kwon commented on SPARK-22980:
--------------------------------------

They are already well explained in documentation. It expects Pandas serise as 
input and output. Newly added description added few more details respecting the 
example above and what you said.

Input and output are designed to be Pandas series for vectorised operations in 
pandas_udf, Scalar vectorised UDFs as documented. Therefore, builtin function 
on this works as Pandas series, not as a each value because the input is Pandas 
series as documented. It produce expected results.

Both are not error cases and produced meaningful results. They are different so 
it produced the different results which are documented.

> Using pandas_udf when inputs are not Pandas's Series or DataFrame
> -----------------------------------------------------------------
>
>                 Key: SPARK-22980
>                 URL: https://issues.apache.org/jira/browse/SPARK-22980
>             Project: Spark
>          Issue Type: Sub-task
>          Components: PySpark
>    Affects Versions: 2.3.0
>            Reporter: Xiao Li
>             Fix For: 2.3.0
>
>
> {noformat}
> from pyspark.sql.functions import pandas_udf
> from pyspark.sql.functions import col, lit
> from pyspark.sql.types import LongType
> df = spark.range(3)
> f = pandas_udf(lambda x, y: len(x) + y, LongType())
> df.select(f(lit('text'), col('id'))).show()
> {noformat}
> {noformat}
> from pyspark.sql.functions import udf
> from pyspark.sql.functions import col, lit
> from pyspark.sql.types import LongType
> df = spark.range(3)
> f = udf(lambda x, y: len(x) + y, LongType())
> df.select(f(lit('text'), col('id'))).show()
> {noformat}
> The results of pandas_udf are different from udf. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-22980) Using pandas_udf when inputs are not Pandas's Series or DataFrame

Reply via email to