Sylvain Zimmer created SPARK-16700:
--------------------------------------

             Summary: StructType doesn't accept Python dicts anymore
                 Key: SPARK-16700
                 URL: https://issues.apache.org/jira/browse/SPARK-16700
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 2.0.0
            Reporter: Sylvain Zimmer


Hello,

I found this issue while testing my codebase with 2.0.0-rc5

StructType in Spark 1.6.2 accepts the Python <dict> type, which is very handy. 
2.0.0-rc5 does not and throws an error.

I don't know if this was intended but I'd advocate for this behaviour to remain 
the same. MapType is probably wasteful when your key names never change and 
switching to Python tuples would be cumbersome.

Here is a minimal script to reproduce the issue: 

{code:python}
from pyspark import SparkContext
from pyspark.sql import types as SparkTypes
from pyspark.sql import SQLContext


sc = SparkContext()
sqlc = SQLContext(sc)

struct_schema = SparkTypes.StructType([
    SparkTypes.StructField("id", SparkTypes.LongType())
])

rdd = sc.parallelize([{"id": 0}, {"id": 1}])

df = sqlc.createDataFrame(rdd, struct_schema)

print df.collect()

# 1.6.2 prints [Row(id=0), Row(id=1)]

# 2.0.0-rc5 raises TypeError: StructType can not accept object {'id': 0} in 
type <type 'dict'>

{code}

Thanks!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to