[
https://issues.apache.org/jira/browse/SPARK-6289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14542807#comment-14542807
]
Michael Nazario commented on SPARK-6289:
----------------------------------------
I still have the same problem in my tests. This is what I have to reproduce it.
I start up a spark context and get a Spark DataFrame from an avro file The
dataframe has a bunch of simple types. These are the results I get:
{code}
>>> print(df)
DataFrame[a: string, Boolean: boolean, BigDecimal: decimal(10,0), DateTime:
timestamp, LocalDate: date, Double: double, Float: float, Integer: int, Long:
bigint, String: string]
>>> print(row)
Row(a=u'0', Boolean=True, BigDecimal=Decimal('1.0'),
DateTime=datetime.datetime(2000, 1, 1, 3, 31), LocalDate=datetime.date(2000, 2,
2), Double=0.1, Float=0.20000000298023224, Integer=1, Long=2, String=u'foo')
>>> print(row.LocalDate)
2000-02-02
>>> print(type(row.LocalDate))
<type 'datetime.date'>
>>> print(row[4])
2000-02-02 00:00:00
>>> print(type(row[4]))
<type 'datetime.datetime'>
{code}
I've reproduced this problem with a much simpler piece of code in the pyspark
shell:
{code}
>>> import pandas, datetime
>>> df = pandas.DataFrame([[datetime.datetime(1990, 1, 1), datetime.date(2000,
>>> 3, 3)]], columns=["foo", "bar"])
>>> sdf = sqlCtx.createDataFrame(df)
>>> sdf
DataFrame[foo: bigint, bar: date]
>>> row = sdf.first()
>>> row
Row(foo=631152000000000000, bar=datetime.date(2000, 3, 3))
>>> row[1]
datetime.datetime(2000, 3, 3, 0, 0)
>>> row.bar
datetime.date(2000, 3, 3)
{code}
> PySpark doesn't maintain SQL date Types
> ---------------------------------------
>
> Key: SPARK-6289
> URL: https://issues.apache.org/jira/browse/SPARK-6289
> Project: Spark
> Issue Type: Bug
> Components: PySpark, SQL
> Affects Versions: 1.2.1
> Reporter: Michael Nazario
> Assignee: Davies Liu
>
> For the TimestampType, Spark SQL requires a datetime.date in Python. However,
> if you collect a row based on that type, you'll end up with a returned value
> which is type datetime.datetime.
> I have tried to reproduce this using the pyspark shell, but have been unable
> to. This is definitely a problem coming from pyrolite though:
> https://github.com/irmen/Pyrolite/
> Pyrolite is being used for datetime and date serialization, but appears to
> not map to date objects, but maps to datetime objects.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]