[
https://issues.apache.org/jira/browse/ARROW-15665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17507479#comment-17507479
]
Joris Van den Bossche commented on ARROW-15665:
-----------------------------------------------
For case 3, in pandas we don't "rollover" for a day that is too large for the
given month, but rather raise an error:
{code}
>>> pd.to_datetime("1999-02-30", format="%Y-%m-%d")
...
ValueError: time data 1999-02-30 doesn't match format specified
{code}
And Python's stdlib seems to do that:
{code}
>>> datetime.datetime.strptime("1999-02-30", "%Y-%m-%d")
...
ValueError: day is out of range for month
{code}
Arrow indeed does roll-over:
{code}
>>> import pyarrow.compute as pc
>>> print(pc.strptime("1999-02-30", format="%Y-%m-%d", unit="s"))
1999-03-02 00:00:00
{code}
Personally, I don't like that behaviour, but I suppose we get this from the
system {{strptime}}? (so that might even depend on your OS?)
It might be interesting to check what date.h's version of strptime does.
> [C++] Add error handling option to StrptimeOptions
> --------------------------------------------------
>
> Key: ARROW-15665
> URL: https://issues.apache.org/jira/browse/ARROW-15665
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: Rok Mihevc
> Assignee: Rok Mihevc
> Priority: Major
> Labels: kernel, pull-request-available
> Time Spent: 40m
> Remaining Estimate: 0h
>
> We want to have an option to either raise, ignore or return NA in case of
> format mismatch.
> See
> [pandas.to_datetime|https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html]
> and lubridate
> [parse_date_time|https://lubridate.tidyverse.org/reference/parse_date_time.html]
> for examples.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)