Hello,
Spark newbie here :)
Why I can't create the dataframe with just one column?
for instance, this works:
df=spark.createDataFrame([("apple",2),("orange",3)],["name","count"])
But this can't work:
df=spark.createDataFrame([("apple"),("orange")],["name"])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/spark/python/pyspark/sql/session.py", line 675, in
createDataFrame
return self._create_dataframe(data, schema, samplingRatio,
verifySchema)
File "/opt/spark/python/pyspark/sql/session.py", line 700, in
_create_dataframe
rdd, schema = self._createFromLocal(map(prepare, data), schema)
File "/opt/spark/python/pyspark/sql/session.py", line 512, in
_createFromLocal
struct = self._inferSchemaFromList(data, names=schema)
File "/opt/spark/python/pyspark/sql/session.py", line 439, in
_inferSchemaFromList
schema = reduce(_merge_type, (_infer_schema(row, names) for row in
data))
File "/opt/spark/python/pyspark/sql/session.py", line 439, in
<genexpr>
schema = reduce(_merge_type, (_infer_schema(row, names) for row in
data))
File "/opt/spark/python/pyspark/sql/types.py", line 1067, in
_infer_schema
raise TypeError("Can not infer schema for type: %s" % type(row))
TypeError: Can not infer schema for type: <class 'str'>
how can I fix it?
Thanks
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org