[
https://issues.apache.org/jira/browse/SPARK-16205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon updated SPARK-16205:
---------------------------------
Labels: bulk-closed (was: )
> dict -> StructType conversion is undocumented
> ---------------------------------------------
>
> Key: SPARK-16205
> URL: https://issues.apache.org/jira/browse/SPARK-16205
> Project: Spark
> Issue Type: Documentation
> Components: PySpark
> Affects Versions: 2.0.0
> Reporter: Max Moroz
> Priority: Minor
> Labels: bulk-closed
>
> According to the docs, StructType is equivalent only to python list and
> tuple. I accidentally returned a dict from a udf function that registered its
> return value as StructType.
> Expected behavior: either (1) an exception is raised (if strict type is
> checked); or (2) dict is treated as an iterable, resulting in a struct being
> created in an arbitrary order from the keys of the dict (horribly dangerous,
> but I'd understand).
> Actual behavior: struct was created "properly", in the sense that keys were
> matched to the field names of the struct, and values were used for values.
> This is wonderful, but completely undocumented as far as I can tell.
> {code}
> import pyspark.sql.functions as F
> import pyspark.sql.types as T
> df = sqlContext.createDataFrame([(1,), (2,)], ['value'])
> fields = 'abcdefgh'
> def udf(type_):
> def to_udf(func):
> return F.udf(func, type_)
> return to_udf
> struct = T.StructType()
> for c in fields:
> struct.add(c, T.StringType())
> @udf(struct)
> def f(row):
> d = dict(zip(fields, fields.upper()))
> return d
> df.select(f('value')).show()
> # output is unexpectedly meaningful, with uppercase letters as values
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]