[
https://issues.apache.org/jira/browse/SPARK-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538051#comment-14538051
]
Kalle Jepsen commented on SPARK-7278:
-------------------------------------
Shouldn't {{DateType}} at least find {{datetime.datetime}} acceptable?
> Inconsistent handling of dates in PySparks Row object
> -----------------------------------------------------
>
> Key: SPARK-7278
> URL: https://issues.apache.org/jira/browse/SPARK-7278
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 1.3.1
> Reporter: Kalle Jepsen
>
> Consider the following Python code:
> {code:none}
> import datetime
> rdd = sc.parallelize([[0, datetime.date(2014, 11, 11)], [1,
> datetime.date(2015,6,4)]])
> df = rdd.toDF(schema=['rid', 'date'])
> row = df.first()
> {code}
> Accessing the {{date}} column via {{\_\_getitem\_\_}} returns a
> {{datetime.datetime}} instance
> {code:none}
> >>>row[1]
> datetime.datetime(2014, 11, 11, 0, 0)
> {code}
> while access via {{getattr}} returns a {{datetime.date}} instance:
> {code:none}
> >>>row.date
> datetime.date(2014, 11, 11)
> {code}
> The problem seems to be that that Java deserializes the {{datetime.date}}
> objects to {{datetime.datetime}}. This is taken care of
> [here|https://github.com/apache/spark/blob/master/python/pyspark/sql/_types.py#L1027]
> when using {{getattr}}, but is overlooked when directly accessing the tuple
> by index.
> Is there an easy way to fix this?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]