Vladimir Feinberg created SPARK-15809:
-----------------------------------------
Summary: PySpark SQL UDF default returnType
Key: SPARK-15809
URL: https://issues.apache.org/jira/browse/SPARK-15809
Project: Spark
Issue Type: Improvement
Components: PySpark
Reporter: Vladimir Feinberg
Priority: Minor
The current signature for the pyspark UDF creation function is:
{code:python}
pyspark.sql.functions.udf(f, returnType=StringType)
{code}
Is there a reason that there's a default parameter for {{returnType}}?
Returning a string by default doesn't strike me as so much more a frequent use
case than, say, returning an integer to merit the default.
In fact, it seems the only reason that the default was chosen is that if we
*had to choose* a default type, it would be a {{StringType}} because that's
what we can implicitly convert everything to.
But this only seems to do two things to me: (1) cause unintentional, annoying
conversions to strings for new users and (2) make call sites less consistent
(if people drop the type specification to actually use the default).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]