[jira] [Commented] (SPARK-16170) Throw error when row is not schema-compatible

Federico Ponzi (JIRA) Thu, 23 Jun 2016 06:21:31 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-16170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15346420#comment-15346420
 ]


Federico Ponzi commented on SPARK-16170:
----------------------------------------

Hi, and thanks for the response.
I've setted this as a bug instead of an improvement because if I do:
{code}
i = [(1, "rol"), (2.4, "str")]
rdd = sc.parallelize(i)
sqlContext.createDataFrame(i, schema=sch)
{code}

Running this I get this output:
{code}
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/spark/python/pyspark/sql/context.py", line 438, in 
createDataFrame
    rdd, schema = self._createFromLocal(map(prepare, data), schema)
  File "/usr/local/spark/python/pyspark/sql/context.py", line 306, in 
_createFromLocal
    data = list(data)
  File "/usr/local/spark/python/pyspark/sql/context.py", line 423, in prepare
    _verify_type(obj, schema)
  File "/usr/local/spark/python/pyspark/sql/types.py", line 1311, in 
_verify_type
    _verify_type(v, f.dataType, f.nullable)
  File "/usr/local/spark/python/pyspark/sql/types.py", line 1283, in 
_verify_type
    raise TypeError("%s can not accept object %r in type %s" % (dataType, obj, 
type(obj)))
TypeError: LongType can not accept object 2.4 in type <type 'float'>
{code}

> Throw error when row is not schema-compatible
> ---------------------------------------------
>
>                 Key: SPARK-16170
>                 URL: https://issues.apache.org/jira/browse/SPARK-16170
>             Project: Spark
>          Issue Type: Improvement
>            Reporter: Federico Ponzi
>            Priority: Minor
>
> We are using Spark to import some data from mysql.
> We just found that many of our imports are useless because our import 
> function was wrongly forcing the longtype to a float column. 
> Consider this example:
> {code}
> from pyspark.sql.types import *
> sqlContext = SQLContext(sc)
> sch = StructType([StructField("id", LongType(), True), StructField("rol", 
> StringType(), True)])
> i = ['{"id": 1, "rol": "str"}', '{"id": 2.4, "rol": "str"}']
> rdd = sc.parallelize(i)
> df = sqlContext.read.json(rdd, schema=sch)
> print df.collect()
> {code}
> The output is:
> {code}
> [Row(id=1, rol=u'str'), Row(id=None, rol=None)]
> {code}
> Every column in the second row is null, not only id which has a wrong 
> datatype and no error is triggered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-16170) Throw error when row is not schema-compatible

Reply via email to