[ 
https://issues.apache.org/jira/browse/SPARK-2674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14075272#comment-14075272
 ] 

Davies Liu commented on SPARK-2674:
-----------------------------------

Date and time in Python will be converted into java.util.Calendar after 
deserialization, so there are several ways to support it:

1. Add CalendarType in Catalyst, similar to Timestamp
2. Add an transform stage when do inferSchema, to transform Calendar into 
Timestamp
3. Extend Pyrolite to create Timestamp object when it meet datetime, date, time

In short term, maybe 2 is the more easy one. In long term, maybe 3 is the best.


> Add date and time types to inferSchema
> --------------------------------------
>
>                 Key: SPARK-2674
>                 URL: https://issues.apache.org/jira/browse/SPARK-2674
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 1.0.0
>            Reporter: Hossein Falaki
>            Assignee: Davies Liu
>
> When I try inferSchema in PySpark on an RDD of dictionary that contains a 
> datatime.datetime object, I get the following exception:
> {code}
> Object of type 
> java.util.GregorianCalendar[time=?,areFieldsSet=false,areAllFieldsSet=false,lenient=true,zone=sun.util.calendar.ZoneInfo[id="Etc/UTC",offset=0,dstSavings=0,useDaylight=false,transitions=0,lastRule=null],firstDayOfWeek=1,minimalDaysInFirstWeek=1,ERA=?,YEAR=2014,MONTH=3,WEEK_OF_YEAR=?,WEEK_OF_MONTH=?,DAY_OF_MONTH=22,DAY_OF_YEAR=?,DAY_OF_WEEK=?,DAY_OF_WEEK_IN_MONTH=?,AM_PM=0,HOUR=0,HOUR_OF_DAY=0,MINUTE=0,SECOND=0,MILLISECOND=4,ZONE_OFFSET=?,DST_OFFSET=?]
>  cannot be used
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to