[ 
https://issues.apache.org/jira/browse/SPARK-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-7278:
-----------------------------------

    Assignee: Apache Spark

> Inconsistent handling of dates in PySparks Row object
> -----------------------------------------------------
>
>                 Key: SPARK-7278
>                 URL: https://issues.apache.org/jira/browse/SPARK-7278
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 1.3.1
>            Reporter: Kalle Jepsen
>            Assignee: Apache Spark
>
> Consider the following Python code:
> {code:none}
> import datetime
> rdd = sc.parallelize([[0, datetime.date(2014, 11, 11)], [1, 
> datetime.date(2015,6,4)]])
> df = rdd.toDF(schema=['rid', 'date'])
> row = df.first()
> {code}
> Accessing the {{date}} column via {{\_\_getitem\_\_}} returns a 
> {{datetime.datetime}} instance
> {code:none}
> >>>row[1]
> datetime.datetime(2014, 11, 11, 0, 0)
> {code}
> while access via {{getattr}} returns a {{datetime.date}} instance:
> {code:none}
> >>>row.date
> datetime.date(2014, 11, 11)
> {code}
> The problem seems to be that that Java deserializes the {{datetime.date}} 
> objects to {{datetime.datetime}}. This is taken care of 
> [here|https://github.com/apache/spark/blob/master/python/pyspark/sql/_types.py#L1027]
>  when using {{getattr}}, but is overlooked when directly accessing the tuple 
> by index.
> Is there an easy way to fix this?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to