[
https://issues.apache.org/jira/browse/ARROW-9907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17580154#comment-17580154
]
Joris Van den Bossche commented on ARROW-9907:
----------------------------------------------
Short update here clarifying the current status of this issue:
1) Parsing sub-second values (fractional second) is supported with the default
ISO8601 parser:
{code}
import io
from pyarrow import csv
s = """col
2015-01-09 00:00:00.000"""
>>> csv.read_csv(io.BytesIO(s.encode()))
pyarrow.Table
col: timestamp[ns]
----
col: [[2015-01-09 00:00:00.000000000]]
>>> csv.read_csv(io.BytesIO(s.encode()),
>>> convert_options=csv.ConvertOptions(timestamp_parsers=[csv.ISO8601]))
pyarrow.Table
col: timestamp[ns]
----
col: [[2015-01-09 00:00:00.000000000]]
{code}
2) It does not yet work when manually specifying the format (resulting type is
string and not timestamp):
{code}
In [28]: csv.read_csv(io.BytesIO(s.encode()),
convert_options=csv.ConvertOptions(timestamp_parsers=["%Y-%m-%d %H:%M:%S.%f"]))
Out[28]:
pyarrow.Table
col: string
----
col: [["2015-01-09 00:00:00.000"]]
{code}
This can also be seen directly in {{strptime}}:
{code}
>>> import pyarrow.compute as pc
>>> pc.strptime("2015-01-09 00:00:00.000", format="%Y-%m-%d %H:%M:%S",
>>> unit="ns")
...
ArrowInvalid: Failed to parse string: '2015-01-09 00:00:00.000' as a scalar of
type timestamp[ns]
>>> pc.strptime("2015-01-09 00:00:00.000", format="%Y-%m-%d %H:%M:%S.%f",
>>> unit="ns")
...
ArrowInvalid: Failed to parse string: '2015-01-09 00:00:00.000' as a scalar of
type timestamp[ns]
{code}
For the issue of parsing fractional seconds in strptime, we also have
ARROW-10430 and ARROW-15883
> [Python] Failed to parse string into timestamp
> ----------------------------------------------
>
> Key: ARROW-9907
> URL: https://issues.apache.org/jira/browse/ARROW-9907
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Reporter: Gary
> Priority: Minor
>
> Hi,
> Not sure if I am missing something, but I am unable to get pyarrow to parse
> my datetimes that are being inferred as strings, to be timestamps.
> My strings are arriving in CSVs with this format: '2015-01-09 00:00:00.000'
> I have tried:
> ```
> convert_ops = csv.ConvertOptions(timestamp_parsers=['%Y-%m-%d %H:%M:%S.%f])
> df = csv.read_csv('path_to_csv', convert_options=convert_opts)
> print(df.schema)
> ```
> This yields no change and has my columns with these formatted timestamps
> still showing as strings.
> Additionally, I have tried casting as well:
> ```
> dfschema = pa.schema([
> ('date_column', pa.timestamp('ms'))
> ])
> df = csv.read_csv('path_to_csv')
> df.cast(target_schema=dfschema)
> ```
> This way yields the error: "pyarrow.lib.ArrowInvalid: Failed to parse string:
> 2015-01-09 00:00:00.000"
> I am using pyarrow=1.0.1 on a linux docker container.
> I tried to send an email to the users email list but it keeps returning a
> Mailer Daemon error.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)