Joris Van den Bossche created ARROW-15883:
---------------------------------------------

             Summary: [C++] Support for fractional seconds in strptime() for 
ISO format?
                 Key: ARROW-15883
                 URL: https://issues.apache.org/jira/browse/ARROW-15883
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++
            Reporter: Joris Van den Bossche


Currently, we can't parse "our own" string representation of a timestamp array 
with the timestamp parser {{strptime}}:

{code:python}
import datetime
import pyarrow as pa
import pyarrow.compute as pc

>>> pa.array([datetime.datetime(2022, 3, 5, 9)])
<pyarrow.lib.TimestampArray object at 0x7f00c1d53dc0>
[
  2022-03-05 09:00:00.000000
]

# trying to parse the above representation as string
>>> pc.strptime(["2022-03-05 09:00:00.000000"], format="%Y-%m-%d %H:%M:%S", 
>>> unit="us")
...
ArrowInvalid: Failed to parse string: '2022-03-05 09:00:00.000000' as a scalar 
of type timestamp[us]
{code}

The reason for this is the fractional second part, so the following works:

{code:python}
>>> pc.strptime(["2022-03-05 09:00:00"], format="%Y-%m-%d %H:%M:%S", unit="us")
<pyarrow.lib.TimestampArray object at 0x7f00c1d6f940>
[
  2022-03-05 09:00:00.000000
]
{code}

Now, I think the reason that this fails is because {{strptime}} only supports 
parsing seconds as an integer 
(https://man7.org/linux/man-pages/man3/strptime.3.html). 

But, it creates a strange situation where the timestamp parser cannot parse the 
representation we use for timestamps.

In addition, for CSV we have a custom ISO parser (used by default), so when 
parsing the strings while reading a CSV file, the same string with fractional 
seconds does work:

{code:python}
s = b"""a
2022-03-05 09:00:00.000000"""

from pyarrow import csv

>>> csv.read_csv(io.BytesIO(s))
pyarrow.Table
a: timestamp[ns]
----
a: [[2022-03-05 09:00:00.000000000]]
{code}

cc [~apitrou] [~rokm]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to