pitrou commented on a change in pull request #12536:
URL: https://github.com/apache/arrow/pull/12536#discussion_r833247348
##########
File path: cpp/src/arrow/config.h
##########
@@ -77,4 +86,13 @@ const BuildInfo& GetBuildInfo();
ARROW_EXPORT
RuntimeInfo GetRuntimeInfo();
+struct ArrowGlobalOptions {
+ /// Path to text timezone database. This is only configurable on Windows,
+ /// which does not have a compatible OS timezone database.
+ util::optional<std::string> tz_db_path;
Review comment:
Can we use a consistent naming here and above?
##########
File path: cpp/src/arrow/config.cc
##########
@@ -73,7 +76,30 @@ RuntimeInfo GetRuntimeInfo() {
MakeSimdLevelString([&](int64_t flags) { return
cpu_info->IsSupported(flags); });
info.detected_simd_level =
MakeSimdLevelString([&](int64_t flags) { return
cpu_info->IsDetected(flags); });
+ info.using_os_timezone_db = USE_OS_TZDB;
+#if !USE_OS_TZDB
+ info.timezone_db_path = timezone_db_path;
Review comment:
You shouldn't leave it uninitialized otherwise.
##########
File path: cpp/src/arrow/public_api_test.cc
##########
@@ -103,4 +104,40 @@ TEST(Misc, BuildInfo) {
ASSERT_THAT(info.full_so_version, ::testing::HasSubstr(info.so_version));
}
+TEST(Misc, SetTimzoneConfig) {
+#ifndef _WIN32
+ GTEST_SKIP() << "Can only set the Timezone database on Windows";
+#else
+#ifndef ARROW_FILESYSTEM
+ GTEST_SKIP() << "Need filesystem support to test timezone config.";
+#else
+ auto fs = std::make_shared<arrow::fs::LocalFileSystem>();
+ auto home_raw = std::getenv("USERPROFILE");
+ std::string home = home_raw == nullptr ? "~" : std::string(home_raw);
+ ASSERT_OK_AND_ASSIGN(std::string tzdata_dir,
+ fs->NormalizePath(home + "\\Downloads\\tzdata"));
Review comment:
This makes the test depend on specifics of where the timezone db was
downloaded.
Do we instead want to expose a `ARROW_TIMEZONE_DB_PATH` environment variable
that would automatically configure the right tz db path? Or is that overkill?
##########
File path: cpp/src/arrow/public_api_test.cc
##########
@@ -103,4 +104,40 @@ TEST(Misc, BuildInfo) {
ASSERT_THAT(info.full_so_version, ::testing::HasSubstr(info.so_version));
}
+TEST(Misc, SetTimzoneConfig) {
+#ifndef _WIN32
+ GTEST_SKIP() << "Can only set the Timezone database on Windows";
+#else
+#ifndef ARROW_FILESYSTEM
Review comment:
```suggestion
#elif !defined(ARROW_FILESYSTEM)
```
##########
File path: cpp/src/arrow/public_api_test.cc
##########
@@ -103,4 +104,40 @@ TEST(Misc, BuildInfo) {
ASSERT_THAT(info.full_so_version, ::testing::HasSubstr(info.so_version));
}
+TEST(Misc, SetTimzoneConfig) {
Review comment:
Typo here.
##########
File path: docs/source/cpp/build_system.rst
##########
@@ -163,3 +163,24 @@ can control the source of each dependency and whether it
is statically or
dynamically linked. See :doc:`/developers/cpp/building` for instructions. Or
alternatively, use Arrow from a package manager such as Conda or vcpkg which
will manage consistent versions of Arrow and its dependencies.
+
+
+Runtime Dependencies
+====================
+
+While Arrow uses the OS-provided timezone database on Linux and Mac OS, it
+requires a user-provided database on Windows. You must download and extract the
+text version of the IANA timezone database and add the Windows timezone mapping
+XML. To download, you can use the following batch script:
+
+.. literalinclude:: ../../../ci/appveyor-cpp-setup.bat
+ :language: cmd
+ :start-after: @rem (Doc section: Download timezone database)
+ :end-before: @rem (Doc section: Download timezone database)
Review comment:
I'd rather if you copied the relevant snippet here, we shouldn't ideally
rely on the contents of CI scripts (which may contain specific quirks that
irrelevant to normal user setups) for the public docs.
##########
File path: cpp/src/arrow/compute/kernels/scalar_temporal_test.cc
##########
@@ -1875,6 +1870,9 @@ TEST_F(ScalarTemporalTest, StrftimeCLocale) {
}
TEST_F(ScalarTemporalTest, StrftimeOtherLocale) {
+#ifdef _WIN32
+ GTEST_SKIP() << "There is a known bug in strftime for locales on Windows
(ARROW-15922)";
Review comment:
Is this on Windows or specifically MinGW? i.e., would the non-MinGW
Windows CI pass if you remove this skip? I'm wondering if we can narrow the
condition.
##########
File path: docs/source/cpp/build_system.rst
##########
@@ -163,3 +163,24 @@ can control the source of each dependency and whether it
is statically or
dynamically linked. See :doc:`/developers/cpp/building` for instructions. Or
alternatively, use Arrow from a package manager such as Conda or vcpkg which
will manage consistent versions of Arrow and its dependencies.
+
+
+Runtime Dependencies
+====================
+
+While Arrow uses the OS-provided timezone database on Linux and Mac OS, it
Review comment:
```suggestion
While Arrow uses the OS-provided timezone database on Linux and macOS, it
```
##########
File path: docs/source/developers/cpp/windows.rst
##########
@@ -362,6 +362,21 @@ suppress dllimport/dllexport marking of symbols. Projects
that statically link
against Arrow on Windows additionally need this definition. The Unix builds do
not use the macro.
+Downloading the Timezone Database
+=================================
+
+To run some of the compute unit tests on Windows, the IANA timezone database
+and the Windows timezone mapping need to be downloaded first. To download
+them to the default search directory (in the user Downloads directory), use
Review comment:
Can you just cross-reference to the relevant section in the building
docs above?
##########
File path: docs/source/cpp/build_system.rst
##########
@@ -163,3 +163,24 @@ can control the source of each dependency and whether it
is statically or
dynamically linked. See :doc:`/developers/cpp/building` for instructions. Or
alternatively, use Arrow from a package manager such as Conda or vcpkg which
will manage consistent versions of Arrow and its dependencies.
+
+
+Runtime Dependencies
+====================
+
+While Arrow uses the OS-provided timezone database on Linux and Mac OS, it
+requires a user-provided database on Windows. You must download and extract the
+text version of the IANA timezone database and add the Windows timezone mapping
+XML. To download, you can use the following batch script:
+
+.. literalinclude:: ../../../ci/appveyor-cpp-setup.bat
+ :language: cmd
+ :start-after: @rem (Doc section: Download timezone database)
+ :end-before: @rem (Doc section: Download timezone database)
+
+By default, the timezone database will be detected at
``%USERPROFILE%\Downloads\tzdata``,
Review comment:
Is this a reliable location? Is there a risk that the downloads folder
gets cleared from time to time?
##########
File path: cpp/src/arrow/config.cc
##########
@@ -73,7 +76,30 @@ RuntimeInfo GetRuntimeInfo() {
MakeSimdLevelString([&](int64_t flags) { return
cpu_info->IsSupported(flags); });
info.detected_simd_level =
MakeSimdLevelString([&](int64_t flags) { return
cpu_info->IsDetected(flags); });
+ info.using_os_timezone_db = USE_OS_TZDB;
+#if !USE_OS_TZDB
+ info.timezone_db_path = timezone_db_path;
+#endif
return info;
}
+Status Initialize(const ArrowGlobalOptions& options) noexcept {
+ if (options.tz_db_path.has_value()) {
+#if !USE_OS_TZDB
+ try {
+ arrow_vendored::date::set_install(options.tz_db_path.value());
+ arrow_vendored::date::reload_tzdb();
+ } catch (const std::runtime_error& e) {
+ return Status::ExecutionError(e.what());
+ }
+ timezone_db_path = options.tz_db_path.value();
+#else
+ return Status::Invalid(
+ "Arrow was set to use OS timezone database at compile time, so it
cannot be set "
+ "at runtime.");
Review comment:
The error message is weird, it reads like you can set to use OS timezone
database at runtime.
##########
File path: cpp/src/arrow/public_api_test.cc
##########
@@ -103,4 +104,40 @@ TEST(Misc, BuildInfo) {
ASSERT_THAT(info.full_so_version, ::testing::HasSubstr(info.so_version));
}
+TEST(Misc, SetTimzoneConfig) {
+#ifndef _WIN32
+ GTEST_SKIP() << "Can only set the Timezone database on Windows";
+#else
+#ifndef ARROW_FILESYSTEM
+ GTEST_SKIP() << "Need filesystem support to test timezone config.";
+#else
+ auto fs = std::make_shared<arrow::fs::LocalFileSystem>();
+ auto home_raw = std::getenv("USERPROFILE");
+ std::string home = home_raw == nullptr ? "~" : std::string(home_raw);
+ ASSERT_OK_AND_ASSIGN(std::string tzdata_dir,
+ fs->NormalizePath(home + "\\Downloads\\tzdata"));
+ ASSERT_OK_AND_ASSIGN(auto tzdata_path,
+
arrow::internal::PlatformFilename::FromString(tzdata_dir));
+ if (!arrow::internal::FileExists(tzdata_path).ValueOr(false)) {
+ GTEST_SKIP() << "Couldn't find timezone database in expected dir: " <<
tzdata_dir;
+ }
+ // Create a tmp directory
+ ASSERT_OK_AND_ASSIGN(auto tempdir,
arrow::internal::TemporaryDir::Make("tzdata"));
+
+ // Validate that setting tzdb to that dir fails
+ arrow::ArrowGlobalOptions options =
{util::make_optional(tempdir->path().ToString())};
+ ASSERT_NOT_OK(arrow::Initialize(options));
+
+ // Copy tzdb data from ~/Downloads
+ auto selector = arrow::fs::FileSelector();
+ selector.base_dir = tzdata_dir;
+ selector.recursive = true;
+ ASSERT_OK(arrow::fs::CopyFiles(fs, selector, fs,
tempdir->path().ToString()));
+
+ // Validate that tzdb is working
+ ASSERT_OK(arrow::Initialize(options));
Review comment:
Hmm, the problem is that later tests may then pick up your temporary
directory for the timezone db, which will then not exist anymore?
##########
File path: cpp/src/arrow/config.cc
##########
@@ -73,7 +76,30 @@ RuntimeInfo GetRuntimeInfo() {
MakeSimdLevelString([&](int64_t flags) { return
cpu_info->IsSupported(flags); });
info.detected_simd_level =
MakeSimdLevelString([&](int64_t flags) { return
cpu_info->IsDetected(flags); });
+ info.using_os_timezone_db = USE_OS_TZDB;
+#if !USE_OS_TZDB
+ info.timezone_db_path = timezone_db_path;
+#endif
return info;
}
+Status Initialize(const ArrowGlobalOptions& options) noexcept {
+ if (options.tz_db_path.has_value()) {
+#if !USE_OS_TZDB
+ try {
+ arrow_vendored::date::set_install(options.tz_db_path.value());
+ arrow_vendored::date::reload_tzdb();
+ } catch (const std::runtime_error& e) {
+ return Status::ExecutionError(e.what());
Review comment:
That status code is really Gandiva specific.
```suggestion
return Status::IOError(e.what());
```
##########
File path: cpp/src/arrow/config.h
##########
@@ -77,4 +86,13 @@ const BuildInfo& GetBuildInfo();
ARROW_EXPORT
RuntimeInfo GetRuntimeInfo();
+struct ArrowGlobalOptions {
Review comment:
Note sure it's worth putting "Arrow" here since this is in the `arrow`
namespace.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]