[jira] [Resolved] (SPARK-10392) Pyspark - Wrong DateType support on JDBC connection

Davies Liu (JIRA) Tue, 01 Sep 2015 15:00:07 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-10392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Davies Liu resolved SPARK-10392.
--------------------------------
       Resolution: Fixed
    Fix Version/s: 1.6.0

Issue resolved by pull request 8556
[https://github.com/apache/spark/pull/8556]

> Pyspark - Wrong DateType support on JDBC connection
> ---------------------------------------------------
>
>                 Key: SPARK-10392
>                 URL: https://issues.apache.org/jira/browse/SPARK-10392
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, SQL
>    Affects Versions: 1.4.1
>            Reporter: Maciej Bryński
>             Fix For: 1.6.0
>
>
> I have following problem.
> I created table.
> {code}
> CREATE TABLE `spark_test` (
>       `id` INT(11) NULL,
>       `date` DATE NULL
> )
> COLLATE='utf8_general_ci'
> ENGINE=InnoDB
> ;
> INSERT INTO `spark_test` (`id`, `date`) VALUES (1, '1970-01-01');
> {code}
> Then I'm trying to read data - date '1970-01-01' is converted to int. This 
> makes data frame incompatible with its own schema.
> {code}
> df = 
> sqlCtx.read.jdbc("jdbc:mysql://host/sandbox?user=user&password=password", 
> 'spark_test')
> print(df.collect())
> df = sqlCtx.createDataFrame(df.rdd, df.schema)
> [Row(id=1, date=0)]
> ---------------------------------------------------------------------------
> TypeError                                 Traceback (most recent call last)
> <ipython-input-36-ebc1d94e0d8c> in <module>()
>       1 df = 
> sqlCtx.read.jdbc("jdbc:mysql://a2.adpilot.co/sandbox?user=mbrynski&password=CebO3ax4",
>  'spark_test')
>       2 print(df.collect())
> ----> 3 df = sqlCtx.createDataFrame(df.rdd, df.schema)
> /mnt/spark/spark/python/pyspark/sql/context.py in createDataFrame(self, data, 
> schema, samplingRatio)
>     402 
>     403         if isinstance(data, RDD):
> --> 404             rdd, schema = self._createFromRDD(data, schema, 
> samplingRatio)
>     405         else:
>     406             rdd, schema = self._createFromLocal(data, schema)
> /mnt/spark/spark/python/pyspark/sql/context.py in _createFromRDD(self, rdd, 
> schema, samplingRatio)
>     296             rows = rdd.take(10)
>     297             for row in rows:
> --> 298                 _verify_type(row, schema)
>     299 
>     300         else:
> /mnt/spark/spark/python/pyspark/sql/types.py in _verify_type(obj, dataType)
>    1152                              "length of fields (%d)" % (len(obj), 
> len(dataType.fields)))
>    1153         for v, f in zip(obj, dataType.fields):
> -> 1154             _verify_type(v, f.dataType)
>    1155 
>    1156 
> /mnt/spark/spark/python/pyspark/sql/types.py in _verify_type(obj, dataType)
>    1136         # subclass of them can not be fromInternald in JVM
>    1137         if type(obj) not in _acceptable_types[_type]:
> -> 1138             raise TypeError("%s can not accept object in type %s" % 
> (dataType, type(obj)))
>    1139 
>    1140     if isinstance(dataType, ArrayType):
> TypeError: DateType can not accept object in type <class 'int'>
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Resolved] (SPARK-10392) Pyspark - Wrong DateType support on JDBC connection

Reply via email to