This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch branch-2.2
in repository https://gitbox.apache.org/repos/asf/orc.git
The following commit(s) were added to refs/heads/branch-2.2 by this push:
new 481aadf4b ORC-2011: [C++] Fix `Timezone` to support legacy `US`
TimeZone identifiers
481aadf4b is described below
commit 481aadf4bbcaf2ebc36e862dafa6a628b94283d2
Author: Dongjoon Hyun <[email protected]>
AuthorDate: Sat Sep 27 08:03:46 2025 -0700
ORC-2011: [C++] Fix `Timezone` to support legacy `US` TimeZone identifiers
### What changes were proposed in this pull request?
This PR aims to fix `Timezone` to support legacy `US` TimeZone identifiers.
### Why are the changes needed?
Since `Ubuntu 24.04` and `Debian 13` doesn't provide old
`/usr/share/zoneinfo/US/*` files, ORC C++ library fails with the following
error by default. It's misleading because both recent `IANA timezone database`
and `TZDIR` cannot resolve this issue. We had better provide a workaround via
aliases.
> C++ exception with description "Time zone file
/usr/share/zoneinfo/US/Pacific does not exist.
> Please install IANA time zone database and set TZDIR env." thrown in the
test body.
Although there are many legacy timezone identifies, this PR aims to focus
on `US` issues. For the rest of the code, we can handle it later based on the
usage.
- https://data.iana.org/time-zones/tzdb/backward
### How was this patch tested?
Pass the CIs and manually run a docker test without these lines.
https://github.com/apache/orc/blob/fbea8e016699ad8e7b318f5c793b4e48fe85af57/docker/ubuntu24/Dockerfile#L58
https://github.com/apache/orc/blob/fbea8e016699ad8e7b318f5c793b4e48fe85af57/docker/debian13/Dockerfile#L40
I verified locally with the revised `Debian 13` image.
```
$ docker run -it --rm apache/orc-dev:debian13 ls -al /usr/share/zoneinfo/US
ls: cannot access '/usr/share/zoneinfo/US': No such file or directory
$ ./run-one.sh local x debian13
Started local run for ORC-2011 on debian13 at Fri Sep 26 21:54:25 PDT 2025
-- The C compiler identification is GNU 14.2.0
-- The CXX compiler identification is GNU 14.2.0
...
Test project /root/build
Start 1: orc-test
1/9 Test #1: orc-test ......................... Passed 7.24 sec
Start 2: java-test
2/9 Test #2: java-test ........................ Passed 110.33 sec
Start 3: java-examples-test
3/9 Test #3: java-examples-test ............... Passed 0.37 sec
Start 4: java-tools-test
4/9 Test #4: java-tools-test .................. Passed 0.06 sec
Start 5: java-bench-gen-test
5/9 Test #5: java-bench-gen-test .............. Passed 0.71 sec
Start 6: java-bench-scan-test
6/9 Test #6: java-bench-scan-test ............. Passed 0.66 sec
Start 7: java-bench-hive-test
7/9 Test #7: java-bench-hive-test ............. Passed 11.14 sec
Start 8: java-bench-spark-test
8/9 Test #8: java-bench-spark-test ............ Passed 214.61 sec
Start 9: tool-test
9/9 Test #9: tool-test ........................ Passed 5.00 sec
100% tests passed, 0 tests failed out of 9
Total Test time (real) = 350.16 sec
Built target test-out
Finished debian13 at Fri Sep 26 22:06:39 PDT 2025
```
Please note that the test coverage should be added separately. In other
words, the docker images should be updated **selectively and gradually** after
this PR because the images are shared among multiple ORC branches. Since
`Debian 13` is added newly for `main` and `branch-2.2` only, I'm planning to
update the following after merging this PR to have a test coverage for this
feature.
https://github.com/apache/orc/blob/fbea8e016699ad8e7b318f5c793b4e48fe85af57/docker/debian13/Dockerfile#L40
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes #2422 from dongjoon-hyun/ORC-2011.
Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 3c89afe70b9e03db694ecac1702f1a31d8e0d230)
Signed-off-by: Dongjoon Hyun <[email protected]>
---
c++/src/Timezone.cc | 43 ++++++++++++++++++++++++++++++++++---------
1 file changed, 34 insertions(+), 9 deletions(-)
diff --git a/c++/src/Timezone.cc b/c++/src/Timezone.cc
index 384f8ea99..bc56efa0d 100644
--- a/c++/src/Timezone.cc
+++ b/c++/src/Timezone.cc
@@ -33,8 +33,24 @@ namespace orc {
// default location of the timezone files
static const char DEFAULT_TZDIR[] = "/usr/share/zoneinfo";
- // location of a symlink to the local timezone
- static const char LOCAL_TIMEZONE[] = "/etc/localtime";
+ // location of a symlink to the local timezone is /etc/localtime
+ static const char LOCAL_TIMEZONE_DIR[] = "/etc";
+ static const char LOCAL_TIMEZONE[] = "localtime";
+
+ // US aliases from https://data.iana.org/time-zones/tzdb/backward
+ static const std::map<const std::string, const std::string> TZ_ALIASES = {
+ {"US/Alaska", "America/Anchorage"},
+ {"US/Aleutian", "America/Adak"},
+ {"US/Arizona", "America/Phoenix"},
+ {"US/Central", "America/Chicago"},
+ {"US/East-Indiana", "America/Indiana/Indianapolis"},
+ {"US/Eastern", "America/New_York"},
+ {"US/Hawaii", "Pacific/Honolulu"},
+ {"US/Indiana-Starke", "America/Indiana/Knox"},
+ {"US/Michigan", "America/Detroit"},
+ {"US/Mountain", "America/Denver"},
+ {"US/Pacific", "America/Los_Angeles"},
+ {"US/Samoa", "Pacific/Pago_Pago"}};
enum TransitionKind { TRANSITION_JULIAN, TRANSITION_DAY, TRANSITION_MONTH };
@@ -734,14 +750,26 @@ namespace orc {
* Get a timezone by absolute filename.
* Results are cached.
*/
- const Timezone& getTimezoneByFilename(const std::string& filename) {
+ const Timezone& getTimezoneByFilename(const std::string& dir, const
std::string& zone) {
+ std::string filename(dir);
+ filename += "/";
+ filename += zone;
// ORC-110
std::lock_guard<std::mutex> timezone_lock(timezone_mutex);
std::map<std::string, std::shared_ptr<Timezone> >::iterator itr =
timezoneCache.find(filename);
if (itr != timezoneCache.end()) {
return *(itr->second).get();
}
- timezoneCache[filename] = std::make_shared<LazyTimezone>(filename);
+ auto it = TZ_ALIASES.find(zone);
+ if (it == TZ_ALIASES.end()) {
+ timezoneCache[filename] = std::make_shared<LazyTimezone>(filename);
+ } else {
+ std::string newfilename(dir);
+ newfilename += "/";
+ newfilename += it->second;
+ timezoneCache[newfilename] = std::make_shared<LazyTimezone>(newfilename);
+ timezoneCache[filename] = timezoneCache[newfilename];
+ }
return *timezoneCache[filename].get();
}
@@ -752,7 +780,7 @@ namespace orc {
#ifdef _MSC_VER
return getTimezoneByName("UTC");
#else
- return getTimezoneByFilename(LOCAL_TIMEZONE);
+ return getTimezoneByFilename(LOCAL_TIMEZONE_DIR, LOCAL_TIMEZONE);
#endif
}
@@ -761,10 +789,7 @@ namespace orc {
* Results are cached.
*/
const Timezone& getTimezoneByName(const std::string& zone) {
- std::string filename(getTimezoneDirectory());
- filename += "/";
- filename += zone;
- return getTimezoneByFilename(filename);
+ return getTimezoneByFilename(getTimezoneDirectory(), zone);
}
/**