kou commented on code in PR #41767:
URL: https://github.com/apache/arrow/pull/41767#discussion_r1615318742


##########
cpp/src/arrow/adapters/orc/adapter.cc:
##########
@@ -189,15 +189,21 @@ liborc::RowReaderOptions DefaultRowReaderOptions() {
 #ifdef ARROW_ORC_NEED_TIME_ZONE_DATABASE_CHECK
 // Proactively check timezone database availability for ORC versions older 
than 2.0.0
 Status CheckTimeZoneDatabaseAvailability() {
-  auto tz_dir = std::getenv("TZDIR");
-  bool is_tzdb_avaiable = tz_dir != nullptr
-                              ? std::filesystem::exists(tz_dir)
-                              : std::filesystem::exists("/usr/share/zoneinfo");
-  if (!is_tzdb_avaiable) {
+  // orc >=2.0.1 will look for tzdb in $CONDA_PREFIX/share/zoneinfo,
+  // which is provided by the package `tzdata` (if installed)
+  auto conda_prefix = std::getenv("CONDA_PREFIX");
+  auto tz_dir_raw = std::getenv("TZDIR");
+  std::string tz_dir = conda_prefix != nullptr
+                           ? std::string(conda_prefix) + "/share/zoneinfo"
+                           : std::string(tz_dir_raw != nullptr ? tz_dir_raw : 
"");
+  bool is_tzdb_available = (!tz_dir.empty())
+                               ? std::filesystem::exists(tz_dir)
+                               : 
std::filesystem::exists("/usr/share/zoneinfo");
+  if (!is_tzdb_available) {
     return Status::Invalid(
         "IANA time zone database is unavailable but required by ORC."
         " Please install it to /usr/share/zoneinfo or set TZDIR env to the 
installed"
-        " directory");
+        " directory. If you are using conda, simply install the package 
`tzdata`.");

Review Comment:
   > Indeed something might be going wrong there
   
   Could you try this?
   
   ```diff
   diff --git a/cpp/cmake_modules/FindorcAlt.cmake 
b/cpp/cmake_modules/FindorcAlt.cmake
   index 289416678a..ce8cd11b4c 100644
   --- a/cpp/cmake_modules/FindorcAlt.cmake
   +++ b/cpp/cmake_modules/FindorcAlt.cmake
   @@ -71,4 +71,5 @@ if(orcAlt_FOUND)
                              PROPERTIES IMPORTED_LOCATION "${ORC_STATIC_LIB}"
                                         INTERFACE_INCLUDE_DIRECTORIES 
"${ORC_INCLUDE_DIR}")
      endif()
   +  set(orcAlt_VERSION ${ORC_VERSION})
    endif()
   ```
   
   > We can add tzdata as a dependency on `orc`, but I want to make this 
unconditional. It's better to always rely on conda-forge's tzdata, rather than 
potentially out-of-date stuff in `/usr/share/zoneinfo` (aside from not 
introducing unnecessary divergence between platforms).
   
   It makes sense.
   
   > We either need to ensure that `ARROW_ORC_NEED_TIME_ZONE_DATABASE_CHECK` 
correctly gets set to false on orc >=2.0.1, or do something like this PR.  I 
think this PR is better
   
   We don't want to maintain this code as much as possible ORC 2.0.0 or later 
have the same check. So we want to choose the former.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to