Re: [I] [Python] pyarrow compute strptime not working with format '%Y%m%d %H%M%S%f' [arrow]

via GitHub Fri, 12 Apr 2024 13:57:40 -0700


nikfio commented on issue #41132:
URL: https://github.com/apache/arrow/issues/41132#issuecomment-2052518403


   Hello @rok,
   
   thank you for your help.
   
   I tried your piece suggested. But executing the first operation still gives 
the same error :
   
   `ts2 = pc.strptime(pc.utf8_slice_codeunits('20090101 185956000',
                      0,
                                             19),
                     format='%Y%m%d %H%M%S%f',
                     unit="ns")`
   `*** pyarrow.lib.ArrowInvalid: Failed to parse string: '20090101 185956000' 
as a scalar of type timestamp[ns]`
   
   
   I know and I am sorry actually, this date format (`%Y%m%d %H%M%S%f`) is a 
pain in the ass.
   
   I managed to work aroud the issue by reading the timestamp at first as 
pyarrow string and the passing through a datetime conversion using pandas 
`to_datetime`. Then finally convert the datetime array into pyarrow array with 
type `timestamp('ms')`.
   
   1. convert timestamp with type pyarrow string  - call it `timestamp_str` - 
to pandas datetime using the date format wanted from the start:
   `import pandas as pd`
   `std_datetime = pd.to_datetime(timestamp_str.to_numpy(), 
                                          format=`%Y%m%d %H%M%S%f`)`
   2.  convert back to pyarrow `array`;
   `import pyarrow as pa`
   `timecol = pa.array(std_datetime, 
                                      type=pa.timestamp('ms'))`
   3. rebuild table as wanted 
   `target_schema = pa.schema([('timestamp', pa.timestamp('ms')), 'otehr 
columns types'])`
   `table = pa.Table.from arrays(
      [ timecol, 'other cols' ], schema=target_schema`
   
   Didn't wrote everything clear in point (3) to be more synthesized. 
   Hope it is understable from everyone, otherwise let me know.
   
   Thanks,
   Nick
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [Python] pyarrow compute strptime not working with format '%Y%m%d %H%M%S%f' [arrow]

Reply via email to