jorisvandenbossche commented on a change in pull request #11358:
URL: https://github.com/apache/arrow/pull/11358#discussion_r727129048
##########
File path: cpp/src/arrow/compute/kernels/scalar_string_test.cc
##########
@@ -1429,6 +1430,17 @@ TYPED_TEST(TestStringKernels, Strptime) {
this->CheckUnary("strptime", input1, timestamp(TimeUnit::MICRO), output1,
&options);
}
+TYPED_TEST(TestStringKernels, StrptimeZoneOffset) {
+ if (!arrow::internal::kStrptimeSupportsZone) {
+ GTEST_SKIP() << "strptime does not support %z on this platform";
+ }
+ std::string input1 = R"(["5/1/2020 +01", null, "12/11/1900 -01:30"])";
+ std::string output1 =
+ R"(["2020-04-30T23:00:00.000000", null, "1900-12-11T01:30:00.000000"])";
+ StrptimeOptions options("%m/%d/%Y %z", TimeUnit::MICRO);
+ this->CheckUnary("strptime", input1, timestamp(TimeUnit::MICRO), output1,
&options);
Review comment:
> And what about the third? Do we error? (This wasn't previously
parseable before.) Joris touches on this above as well.
I think we should error by default, but ideally with the option to force
interpreting the ones without offset in UTC (or any specified timezone).
For context, that's also more or less what pandas does. It doesn't error,
but if you have a mixture of timezone-aware and naive timestamp strings while
trying to convert them with `pd.to_datetime(..)` or `pd.DatetimeIndex(..)`, it
will return "object"-dtype array with a mixture of timezone-aware and naive
datetime objects (which is basically useless if you want to perform timestamp
operations on that column). In Arrow we of course don't have the concept of
"generic objects", so I think erroring is the closest alternative.
The aforementioned pandas functions have a `utc=True` option to explicitly
ask to return values with a proper timezone-aware datetime64 dtype (where the
naive datetimes are interpreted to be in UTC, and the aware datetimes with an
offset are converted to UTC).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]