Quick follow up. I'm trying to work around this myself in the meantime. The
goal is to qualify the TimestampValue with a timezone (by creating a new column
in the arrow table based off the previous one). If this can be done before the
Value's are converted to python it may fix the issue I was having. But it
doesn't appear that I can create a new Timestamp type column with the values
from the old timestamp column.
Here is the code I'm using:
def chunkedToArray(data):
for chunk in data.iterchunks():
for value in chunk:
yield value
def datetimeColumnsAddTimezone(table):
for i, field in enumerate(table.schema):
if field.type == pa.timestamp('ns'):
newField = pa.field(field.name, pa.timestamp('ns', tz='GMT'),
field.nullable, field.metadata)
newArray = pa.array([val for val in chunkedToArray(table[i].data)],
pa.timestamp('ns', tz='GMT'))
newColumn = pa.Column.from_array(newField, newArray)
table = table.remove_column(i)
table = table.add_column(i, newColumn)
return table
Cheers, Lucas Pickup
From: Lucas Pickup [mailto:[email protected]]
Sent: Friday, August 25, 2017 3:23 PM
To: [email protected]
Subject: Reading Parquet datetime column gives different answer in Spark vs
PyArrow
Hi all,
I've been messing around with Spark and PyArrow Parquet reading. In my testing
I've found that a Parquet file written by Spark containing a datetime column,
results in different datetimes from Spark and PyArrow.
The attached script demonstrates this.
Output:
Spark Reading the parquet file into a DataFrame:
[Row(Date=datetime.datetime(2015, 7, 5, 23, 50)),
Row(Date=datetime.datetime(2015, 7, 5, 23, 30))]
PyArrow table has dates as UTC (7 hours ahead)
<pyarrow.lib.TimestampArray object at 0x0000029F3AFE79A8>
[
Timestamp('2015-07-06 06:50:00')
]
Pandas DF from pyarrow table has dates as UTC (7 hours ahead)
Date
0 2015-07-06 06:50:00
1 2015-07-06 06:30:00
I would've expected to end up with the same datetime from both readers since
there was no timezone attached at any point. It just a date and time value.
Am I missing anything here? Or is this a bug.
Cheers, Lucas Pickup