[ 
https://issues.apache.org/jira/browse/SPARK-6289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14542807#comment-14542807
 ] 

Michael Nazario commented on SPARK-6289:
----------------------------------------

I still have the same problem in my tests. This is what I have to reproduce it.

I start up a spark context and get a Spark DataFrame from an avro file The 
dataframe has a bunch of simple types. These are the results I get:
{code}
>>> print(df)
DataFrame[a: string, Boolean: boolean, BigDecimal: decimal(10,0), DateTime: 
timestamp, LocalDate: date, Double: double, Float: float, Integer: int, Long: 
bigint, String: string]
>>> print(row)
Row(a=u'0', Boolean=True, BigDecimal=Decimal('1.0'), 
DateTime=datetime.datetime(2000, 1, 1, 3, 31), LocalDate=datetime.date(2000, 2, 
2), Double=0.1, Float=0.20000000298023224, Integer=1, Long=2, String=u'foo')
>>> print(row.LocalDate)
2000-02-02
>>> print(type(row.LocalDate))
<type 'datetime.date'>
>>> print(row[4])
2000-02-02 00:00:00
>>> print(type(row[4]))
<type 'datetime.datetime'>
{code}

I've reproduced this problem with a much simpler piece of code in the pyspark 
shell:

{code}
>>> import pandas, datetime
>>> df = pandas.DataFrame([[datetime.datetime(1990, 1, 1), datetime.date(2000, 
>>> 3, 3)]], columns=["foo", "bar"])
>>> sdf = sqlCtx.createDataFrame(df)
>>> sdf
DataFrame[foo: bigint, bar: date]
>>> row = sdf.first()
>>> row
Row(foo=631152000000000000, bar=datetime.date(2000, 3, 3))
>>> row[1]
datetime.datetime(2000, 3, 3, 0, 0)
>>> row.bar
datetime.date(2000, 3, 3)
{code}

> PySpark doesn't maintain SQL date Types
> ---------------------------------------
>
>                 Key: SPARK-6289
>                 URL: https://issues.apache.org/jira/browse/SPARK-6289
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, SQL
>    Affects Versions: 1.2.1
>            Reporter: Michael Nazario
>            Assignee: Davies Liu
>
> For the TimestampType, Spark SQL requires a datetime.date in Python. However, 
> if you collect a row based on that type, you'll end up with a returned value 
> which is type datetime.datetime.
> I have tried to reproduce this using the pyspark shell, but have been unable 
> to. This is definitely a problem coming from pyrolite though:
> https://github.com/irmen/Pyrolite/
> Pyrolite is being used for datetime and date serialization, but appears to 
> not map to date objects, but maps to datetime objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to