nikfio commented on issue #41132:
URL: https://github.com/apache/arrow/issues/41132#issuecomment-2052518403
Hello @rok,
thank you for your help.
I tried your piece suggested. But executing the first operation still gives
the same error :
`ts2 = pc.strptime(pc.utf8_slice_codeunits('20090101 185956000',
0,
19),
format='%Y%m%d %H%M%S%f',
unit="ns")`
`*** pyarrow.lib.ArrowInvalid: Failed to parse string: '20090101 185956000'
as a scalar of type timestamp[ns]`
I know and I am sorry actually, this date format (`%Y%m%d %H%M%S%f`) is a
pain in the ass.
I managed to work aroud the issue by reading the timestamp at first as
pyarrow string and the passing through a datetime conversion using pandas
`to_datetime`. Then finally convert the datetime array into pyarrow array with
type `timestamp('ms')`.
1. convert timestamp with type pyarrow string - call it `timestamp_str` -
to pandas datetime using the date format wanted from the start:
`import pandas as pd`
`std_datetime = pd.to_datetime(timestamp_str.to_numpy(),
format=`%Y%m%d %H%M%S%f`)`
2. convert back to pyarrow `array`;
`import pyarrow as pa`
`timecol = pa.array(std_datetime,
type=pa.timestamp('ms'))`
3. rebuild table as wanted
`target_schema = pa.schema([('timestamp', pa.timestamp('ms')), 'otehr
columns types'])`
`table = pa.Table.from arrays(
[ timecol, 'other cols' ], schema=target_schema`
Didn't wrote everything clear in point (3) to be more synthesized.
Hope it is understable from everyone, otherwise let me know.
Thanks,
Nick
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]