h-vetinari commented on code in PR #41767:
URL: https://github.com/apache/arrow/pull/41767#discussion_r1614977296
##########
cpp/src/arrow/adapters/orc/adapter.cc:
##########
@@ -189,15 +189,21 @@ liborc::RowReaderOptions DefaultRowReaderOptions() {
#ifdef ARROW_ORC_NEED_TIME_ZONE_DATABASE_CHECK
// Proactively check timezone database availability for ORC versions older
than 2.0.0
Status CheckTimeZoneDatabaseAvailability() {
- auto tz_dir = std::getenv("TZDIR");
- bool is_tzdb_avaiable = tz_dir != nullptr
- ? std::filesystem::exists(tz_dir)
- : std::filesystem::exists("/usr/share/zoneinfo");
- if (!is_tzdb_avaiable) {
+ // orc >=2.0.1 will look for tzdb in $CONDA_PREFIX/share/zoneinfo,
+ // which is provided by the package `tzdata` (if installed)
+ auto conda_prefix = std::getenv("CONDA_PREFIX");
+ auto tz_dir_raw = std::getenv("TZDIR");
+ std::string tz_dir = conda_prefix != nullptr
+ ? std::string(conda_prefix) + "/share/zoneinfo"
+ : std::string(tz_dir_raw != nullptr ? tz_dir_raw :
"");
+ bool is_tzdb_available = (!tz_dir.empty())
+ ? std::filesystem::exists(tz_dir)
+ :
std::filesystem::exists("/usr/share/zoneinfo");
+ if (!is_tzdb_available) {
return Status::Invalid(
"IANA time zone database is unavailable but required by ORC."
" Please install it to /usr/share/zoneinfo or set TZDIR env to the
installed"
- " directory");
+ " directory. If you are using conda, simply install the package
`tzdata`.");
Review Comment:
> ORC 2.0.1 can find tzdb without setting `TZDIR` with conda
(`CONDA_PREFIX`), right?
Yes. If the environment variable `CONDA_PREFIX` is set, the code will prefer
looking in `$CONDA_PREFIX/share/zoneinfo`, which is what's shipped through
`tzdata`.
> `libarrow` conda package uses ORC 2.0.1, right?
As of https://github.com/conda-forge/arrow-cpp-feedstock/pull/1421 (and the
various backports), yes.
> So this message will not be shown for conda users. If this message is
shown with ORC 2.0.1, ORC version detection may be failed.
Indeed something might be going wrong there, because
https://github.com/conda-forge/pyarrow-feedstock/pull/122 failed before I
backported this PR (in
https://github.com/conda-forge/arrow-cpp-feedstock/pull/1424), despite building
against orc 2.0.1 already
> Can we add `tzdata` to dependencies of `orc` only on Windows? If we can do
it, it will solve this problem.
We can add tzdata as a dependency on `orc`, but I want to make this
unconditional. It's better to always rely on conda-forge's tzdata, rather than
potentially out-of-date stuff in `/usr/share/zoneinfo` (aside from not
introducing unnecessary divergence between platforms).
In any case, the presence of `tzdata` itself is not enough. We either need
to ensure that `ARROW_ORC_NEED_TIME_ZONE_DATABASE_CHECK` correctly gets set to
false on orc >=2.0.1, or do something like this PR. I think this PR is better,
because you cannot know at compile-time if the package will end up in a
conda-environment, and IMO this check should match what `orc` does, i.e. check
`CONDA_PREFIX`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]