This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-2.2
in repository https://gitbox.apache.org/repos/asf/orc.git


The following commit(s) were added to refs/heads/branch-2.2 by this push:
     new 481aadf4b ORC-2011: [C++] Fix `Timezone` to support legacy `US` 
TimeZone identifiers
481aadf4b is described below

commit 481aadf4bbcaf2ebc36e862dafa6a628b94283d2
Author: Dongjoon Hyun <[email protected]>
AuthorDate: Sat Sep 27 08:03:46 2025 -0700

    ORC-2011: [C++] Fix `Timezone` to support legacy `US` TimeZone identifiers
    
    ### What changes were proposed in this pull request?
    
    This PR aims to fix `Timezone` to support legacy `US` TimeZone identifiers.
    
    ### Why are the changes needed?
    
    Since `Ubuntu 24.04` and `Debian 13` doesn't provide old 
`/usr/share/zoneinfo/US/*` files, ORC C++ library fails with the following 
error by default. It's misleading because both recent `IANA timezone database` 
and `TZDIR` cannot resolve this issue. We had better provide a workaround via 
aliases.
    
    > C++ exception with description "Time zone file 
/usr/share/zoneinfo/US/Pacific does not exist.
    > Please install IANA time zone database and set TZDIR env." thrown in the 
test body.
    
    Although there are many legacy timezone identifies, this PR aims to focus 
on `US` issues. For the rest of the code, we can handle it later based on the 
usage.
    
    - https://data.iana.org/time-zones/tzdb/backward
    
    ### How was this patch tested?
    
    Pass the CIs and manually run a docker test without these lines.
    
    
https://github.com/apache/orc/blob/fbea8e016699ad8e7b318f5c793b4e48fe85af57/docker/ubuntu24/Dockerfile#L58
    
    
https://github.com/apache/orc/blob/fbea8e016699ad8e7b318f5c793b4e48fe85af57/docker/debian13/Dockerfile#L40
    
    I verified locally with the revised `Debian 13` image.
    
    ```
    $ docker run -it --rm apache/orc-dev:debian13 ls -al /usr/share/zoneinfo/US
    ls: cannot access '/usr/share/zoneinfo/US': No such file or directory
    
    $ ./run-one.sh local x debian13
    Started local run for ORC-2011 on debian13 at Fri Sep 26 21:54:25 PDT 2025
    -- The C compiler identification is GNU 14.2.0
    -- The CXX compiler identification is GNU 14.2.0
    ...
    
    Test project /root/build
        Start 1: orc-test
    1/9 Test #1: orc-test .........................   Passed    7.24 sec
        Start 2: java-test
    2/9 Test #2: java-test ........................   Passed  110.33 sec
        Start 3: java-examples-test
    3/9 Test #3: java-examples-test ...............   Passed    0.37 sec
        Start 4: java-tools-test
    4/9 Test #4: java-tools-test ..................   Passed    0.06 sec
        Start 5: java-bench-gen-test
    5/9 Test #5: java-bench-gen-test ..............   Passed    0.71 sec
        Start 6: java-bench-scan-test
    6/9 Test #6: java-bench-scan-test .............   Passed    0.66 sec
        Start 7: java-bench-hive-test
    7/9 Test #7: java-bench-hive-test .............   Passed   11.14 sec
        Start 8: java-bench-spark-test
    8/9 Test #8: java-bench-spark-test ............   Passed  214.61 sec
        Start 9: tool-test
    9/9 Test #9: tool-test ........................   Passed    5.00 sec
    
    100% tests passed, 0 tests failed out of 9
    
    Total Test time (real) = 350.16 sec
    Built target test-out
    Finished debian13 at Fri Sep 26 22:06:39 PDT 2025
    ```
    
    Please note that the test coverage should be added separately. In other 
words, the docker images should be updated **selectively and gradually** after 
this PR because the images are shared among multiple ORC branches. Since 
`Debian 13` is added newly for `main` and `branch-2.2` only, I'm planning to 
update the following after merging this PR to have a test coverage for this 
feature.
    
    
https://github.com/apache/orc/blob/fbea8e016699ad8e7b318f5c793b4e48fe85af57/docker/debian13/Dockerfile#L40
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No.
    
    Closes #2422 from dongjoon-hyun/ORC-2011.
    
    Authored-by: Dongjoon Hyun <[email protected]>
    Signed-off-by: Dongjoon Hyun <[email protected]>
    (cherry picked from commit 3c89afe70b9e03db694ecac1702f1a31d8e0d230)
    Signed-off-by: Dongjoon Hyun <[email protected]>
---
 c++/src/Timezone.cc | 43 ++++++++++++++++++++++++++++++++++---------
 1 file changed, 34 insertions(+), 9 deletions(-)

diff --git a/c++/src/Timezone.cc b/c++/src/Timezone.cc
index 384f8ea99..bc56efa0d 100644
--- a/c++/src/Timezone.cc
+++ b/c++/src/Timezone.cc
@@ -33,8 +33,24 @@ namespace orc {
   // default location of the timezone files
   static const char DEFAULT_TZDIR[] = "/usr/share/zoneinfo";
 
-  // location of a symlink to the local timezone
-  static const char LOCAL_TIMEZONE[] = "/etc/localtime";
+  // location of a symlink to the local timezone is /etc/localtime
+  static const char LOCAL_TIMEZONE_DIR[] = "/etc";
+  static const char LOCAL_TIMEZONE[] = "localtime";
+
+  // US aliases from https://data.iana.org/time-zones/tzdb/backward
+  static const std::map<const std::string, const std::string> TZ_ALIASES = {
+      {"US/Alaska", "America/Anchorage"},
+      {"US/Aleutian", "America/Adak"},
+      {"US/Arizona", "America/Phoenix"},
+      {"US/Central", "America/Chicago"},
+      {"US/East-Indiana", "America/Indiana/Indianapolis"},
+      {"US/Eastern", "America/New_York"},
+      {"US/Hawaii", "Pacific/Honolulu"},
+      {"US/Indiana-Starke", "America/Indiana/Knox"},
+      {"US/Michigan", "America/Detroit"},
+      {"US/Mountain", "America/Denver"},
+      {"US/Pacific", "America/Los_Angeles"},
+      {"US/Samoa", "Pacific/Pago_Pago"}};
 
   enum TransitionKind { TRANSITION_JULIAN, TRANSITION_DAY, TRANSITION_MONTH };
 
@@ -734,14 +750,26 @@ namespace orc {
    * Get a timezone by absolute filename.
    * Results are cached.
    */
-  const Timezone& getTimezoneByFilename(const std::string& filename) {
+  const Timezone& getTimezoneByFilename(const std::string& dir, const 
std::string& zone) {
+    std::string filename(dir);
+    filename += "/";
+    filename += zone;
     // ORC-110
     std::lock_guard<std::mutex> timezone_lock(timezone_mutex);
     std::map<std::string, std::shared_ptr<Timezone> >::iterator itr = 
timezoneCache.find(filename);
     if (itr != timezoneCache.end()) {
       return *(itr->second).get();
     }
-    timezoneCache[filename] = std::make_shared<LazyTimezone>(filename);
+    auto it = TZ_ALIASES.find(zone);
+    if (it == TZ_ALIASES.end()) {
+      timezoneCache[filename] = std::make_shared<LazyTimezone>(filename);
+    } else {
+      std::string newfilename(dir);
+      newfilename += "/";
+      newfilename += it->second;
+      timezoneCache[newfilename] = std::make_shared<LazyTimezone>(newfilename);
+      timezoneCache[filename] = timezoneCache[newfilename];
+    }
     return *timezoneCache[filename].get();
   }
 
@@ -752,7 +780,7 @@ namespace orc {
 #ifdef _MSC_VER
     return getTimezoneByName("UTC");
 #else
-    return getTimezoneByFilename(LOCAL_TIMEZONE);
+    return getTimezoneByFilename(LOCAL_TIMEZONE_DIR, LOCAL_TIMEZONE);
 #endif
   }
 
@@ -761,10 +789,7 @@ namespace orc {
    * Results are cached.
    */
   const Timezone& getTimezoneByName(const std::string& zone) {
-    std::string filename(getTimezoneDirectory());
-    filename += "/";
-    filename += zone;
-    return getTimezoneByFilename(filename);
+    return getTimezoneByFilename(getTimezoneDirectory(), zone);
   }
 
   /**

Reply via email to