h-vetinari commented on code in PR #41767:
URL: https://github.com/apache/arrow/pull/41767#discussion_r1609284384


##########
cpp/src/arrow/adapters/orc/adapter.cc:
##########
@@ -189,15 +189,21 @@ liborc::RowReaderOptions DefaultRowReaderOptions() {
 #ifdef ARROW_ORC_NEED_TIME_ZONE_DATABASE_CHECK
 // Proactively check timezone database availability for ORC versions older 
than 2.0.0
 Status CheckTimeZoneDatabaseAvailability() {
-  auto tz_dir = std::getenv("TZDIR");
-  bool is_tzdb_avaiable = tz_dir != nullptr
-                              ? std::filesystem::exists(tz_dir)
-                              : std::filesystem::exists("/usr/share/zoneinfo");
-  if (!is_tzdb_avaiable) {
+  // orc >=2.0.1 will look for tzdb in $CONDA_PREFIX/share/zoneinfo,
+  // which is provided by the package `tzdata` (if installed)
+  auto conda_prefix = std::getenv("CONDA_PREFIX");
+  auto tz_dir_raw = std::getenv("TZDIR");
+  std::string tz_dir = conda_prefix != nullptr
+                           ? std::string(conda_prefix) + "/share/zoneinfo"
+                           : std::string(tz_dir_raw != nullptr ? tz_dir_raw : 
"");
+  bool is_tzdb_available = (!tz_dir.empty())
+                               ? std::filesystem::exists(tz_dir)
+                               : 
std::filesystem::exists("/usr/share/zoneinfo");
+  if (!is_tzdb_available) {
     return Status::Invalid(
         "IANA time zone database is unavailable but required by ORC."
         " Please install it to /usr/share/zoneinfo or set TZDIR env to the 
installed"
-        " directory");
+        " directory. If you are using conda, simply install the package 
`tzdata`.");

Review Comment:
   > I don't know why this is happen with conda.
   
   I described it in the issue (#41755) and in #36026: it's due to 
https://github.com/apache/orc/pull/1882, which was the only way to avoid 
injecting `TZDIR` as an environment variable into every conda environment using 
arrow on windows, which would have been very intrusive.
   
   > Can we add `tzdata` to dependencies of `arrow-cpp` or `orc` conda packages?
   
   That already happened in 
https://github.com/conda-forge/pyarrow-feedstock/pull/122 (and backports to 
15.x, 14.x, 13.x). In general, someone needs to sync the feedstocks back to the 
repo here, but that's unrelated to this PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to