[jira] [Commented] (ARROW-9907) [Python] Failed to parse string into timestamp

Joris Van den Bossche (Jira) Tue, 16 Aug 2022 01:31:53 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-9907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17580154#comment-17580154
 ]


Joris Van den Bossche commented on ARROW-9907:
----------------------------------------------

Short update here clarifying the current status of this issue:

1) Parsing sub-second values (fractional second) is supported with the default 
ISO8601 parser:

{code}
import io
from pyarrow import csv

s = """col
2015-01-09 00:00:00.000"""

>>> csv.read_csv(io.BytesIO(s.encode()))
pyarrow.Table
col: timestamp[ns]
----
col: [[2015-01-09 00:00:00.000000000]]

>>> csv.read_csv(io.BytesIO(s.encode()), 
>>> convert_options=csv.ConvertOptions(timestamp_parsers=[csv.ISO8601]))
pyarrow.Table
col: timestamp[ns]
----
col: [[2015-01-09 00:00:00.000000000]]
{code}

2) It does not yet work when manually specifying the format (resulting type is 
string and not timestamp):

{code}
In [28]: csv.read_csv(io.BytesIO(s.encode()), 
convert_options=csv.ConvertOptions(timestamp_parsers=["%Y-%m-%d %H:%M:%S.%f"]))
Out[28]: 
pyarrow.Table
col: string
----
col: [["2015-01-09 00:00:00.000"]]
{code}

This can also be seen directly in {{strptime}}:

{code}
>>> import pyarrow.compute as pc
>>> pc.strptime("2015-01-09 00:00:00.000", format="%Y-%m-%d %H:%M:%S", 
>>> unit="ns")
...
ArrowInvalid: Failed to parse string: '2015-01-09 00:00:00.000' as a scalar of 
type timestamp[ns]

>>> pc.strptime("2015-01-09 00:00:00.000", format="%Y-%m-%d %H:%M:%S.%f", 
>>> unit="ns")
...
ArrowInvalid: Failed to parse string: '2015-01-09 00:00:00.000' as a scalar of 
type timestamp[ns]
{code}

For the issue of parsing fractional seconds in strptime, we also have 
ARROW-10430 and ARROW-15883

> [Python] Failed to parse string into timestamp
> ----------------------------------------------
>
>                 Key: ARROW-9907
>                 URL: https://issues.apache.org/jira/browse/ARROW-9907
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>            Reporter: Gary
>            Priority: Minor
>
> Hi,
> Not sure if I am missing something, but I am unable to get pyarrow to parse 
> my datetimes that are being inferred as strings, to be timestamps.
> My strings are arriving in CSVs with this format: '2015-01-09 00:00:00.000'
> I have tried:
> ```
> convert_ops = csv.ConvertOptions(timestamp_parsers=['%Y-%m-%d %H:%M:%S.%f])
> df = csv.read_csv('path_to_csv', convert_options=convert_opts)
> print(df.schema)
> ```
> This yields no change and has my columns with these formatted timestamps 
> still showing as strings.
> Additionally, I have tried casting as well:
> ```
> dfschema = pa.schema([
> ('date_column', pa.timestamp('ms'))
> ])
> df = csv.read_csv('path_to_csv')
> df.cast(target_schema=dfschema)
> ```
> This way yields the error: "pyarrow.lib.ArrowInvalid: Failed to parse string: 
> 2015-01-09 00:00:00.000"
> I am using pyarrow=1.0.1 on a linux docker container.
> I tried to send an email to the users email list but it keeps returning a 
> Mailer Daemon error.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (ARROW-9907) [Python] Failed to parse string into timestamp

Reply via email to