[ 
https://issues.apache.org/jira/browse/SPARK-16205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-16205:
---------------------------------
    Labels: bulk-closed  (was: )

> dict -> StructType conversion is undocumented
> ---------------------------------------------
>
>                 Key: SPARK-16205
>                 URL: https://issues.apache.org/jira/browse/SPARK-16205
>             Project: Spark
>          Issue Type: Documentation
>          Components: PySpark
>    Affects Versions: 2.0.0
>            Reporter: Max Moroz
>            Priority: Minor
>              Labels: bulk-closed
>
> According to the docs, StructType is equivalent only to python list and 
> tuple. I accidentally returned a dict from a udf function that registered its 
> return value as StructType.
> Expected behavior: either (1) an exception is raised (if strict type is 
> checked); or (2) dict is treated as an iterable, resulting in a struct being 
> created in an arbitrary order from the keys of the dict (horribly dangerous, 
> but I'd understand).
> Actual behavior: struct was created "properly", in the sense that keys were 
> matched to the field names of the struct, and values were used for values.
> This is wonderful, but completely undocumented as far as I can tell.
> {code}
> import pyspark.sql.functions as F
> import pyspark.sql.types as T
> df = sqlContext.createDataFrame([(1,), (2,)], ['value'])
> fields = 'abcdefgh'
> def udf(type_):
>   def to_udf(func):
>     return F.udf(func, type_)
>   return to_udf
> struct = T.StructType()
> for c in fields:
>   struct.add(c, T.StringType())
> @udf(struct)
> def f(row):
>   d = dict(zip(fields, fields.upper()))
>   return d
> df.select(f('value')).show()
> # output is unexpectedly meaningful, with uppercase letters as values
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to