[
https://issues.apache.org/jira/browse/ARROW-17398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17580292#comment-17580292
]
Rok Mihevc commented on ARROW-17398:
------------------------------------
We use [musl
strptime|https://github.com/esmil/musl/blob/master/src/time/strptime.c] on
windows and [regular
strptime|https://man7.org/linux/man-pages/man3/strptime.3.html] elsewhere
([source|https://github.com/apache/arrow/blob/cc8f6c0267680d5f353766a3bafe5822b0ceb88f/cpp/src/arrow/util/value_parsing.h#L781]).
My assumption is that all timezone information is currently ignored, simple
test (on M1):
{code:python}
import pyarrow.compute as pc
import pyarrow as pa
pc.strptime(pa.array(["2022-01-01 12:00 CET"]), format="%Y-%m-%d %H:%M %Z",
unit="ms")
{code}
Returns 2022-01-01 12:00:00.000 with type TimestampType(timestamp[ms])
While I would expect 2022-01-01 11:00:00.000 with type
TimestampType(timestamp[ms], "UTC)
date.h we vendor [supports timezone
parsing|https://github.com/HowardHinnant/date/wiki/Examples-and-Recipes#parse_daylight_transition]
but is significantly slower as per [this
comment|https://github.com/apache/arrow/blob/cc8f6c0267680d5f353766a3bafe5822b0ceb88f/cpp/src/arrow/util/value_parsing.h#L785].
I'm not sure how to continue but I expect there is some significant work to be
done here.
> [C++] Add support for %Z to strptime
> -------------------------------------
>
> Key: ARROW-17398
> URL: https://issues.apache.org/jira/browse/ARROW-17398
> Project: Apache Arrow
> Issue Type: Improvement
> Components: R
> Reporter: Rok Mihevc
> Priority: Minor
> Labels: kernel
>
> While lubridate does not support %Z flag for strptime Arrow could.
> Changes to C++ kernels might be required for support on all platforms, but
> that shouldn't block implementation as kStrptimeSupportsZone flag can be
> used, [see
> proposal|https://github.com/apache/arrow/pull/13854#issuecomment-1212694663].
--
This message was sent by Atlassian Jira
(v8.20.10#820010)