Maciej Bryński created SPARK-10392: --------------------------------------
Summary: Pyspark - Wrong DateType support Key: SPARK-10392 URL: https://issues.apache.org/jira/browse/SPARK-10392 Project: Spark Issue Type: Bug Components: PySpark, SQL Reporter: Maciej Bryński I have following problem. I created table. {code} CREATE TABLE `spark_test` ( `id` INT(11) NULL, `date` DATE NULL ) COLLATE='utf8_general_ci' ENGINE=InnoDB ; INSERT INTO `sandbox`.`spark_test` (`id`, `date`) VALUES (1, '1970-01-01'); {code} Then I'm trying to read data and date '1970-01-01' is converted to int. This makes rdd incompatible with its own schema. {code} df = sqlCtx.read.jdbc("jdbc:mysql://host/sandbox?user=user&password=password", 'spark_test') print(df.collect()) df = sqlCtx.createDataFrame(df.rdd, df.schema) [Row(id=1, date=0)] --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-36-ebc1d94e0d8c> in <module>() 1 df = sqlCtx.read.jdbc("jdbc:mysql://a2.adpilot.co/sandbox?user=mbrynski&password=CebO3ax4", 'spark_test') 2 print(df.collect()) ----> 3 df = sqlCtx.createDataFrame(df.rdd, df.schema) /mnt/spark/spark/python/pyspark/sql/context.py in createDataFrame(self, data, schema, samplingRatio) 402 403 if isinstance(data, RDD): --> 404 rdd, schema = self._createFromRDD(data, schema, samplingRatio) 405 else: 406 rdd, schema = self._createFromLocal(data, schema) /mnt/spark/spark/python/pyspark/sql/context.py in _createFromRDD(self, rdd, schema, samplingRatio) 296 rows = rdd.take(10) 297 for row in rows: --> 298 _verify_type(row, schema) 299 300 else: /mnt/spark/spark/python/pyspark/sql/types.py in _verify_type(obj, dataType) 1152 "length of fields (%d)" % (len(obj), len(dataType.fields))) 1153 for v, f in zip(obj, dataType.fields): -> 1154 _verify_type(v, f.dataType) 1155 1156 /mnt/spark/spark/python/pyspark/sql/types.py in _verify_type(obj, dataType) 1136 # subclass of them can not be fromInternald in JVM 1137 if type(obj) not in _acceptable_types[_type]: -> 1138 raise TypeError("%s can not accept object in type %s" % (dataType, type(obj))) 1139 1140 if isinstance(dataType, ArrayType): TypeError: DateType can not accept object in type <class 'int'> {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org