IMPALA-5357: Fix unixtime to UTC TimestampValue perf Fixes the severe perf issue when calling gmtime_r to convert a unix time because that libc function takes a global lock. Instead, use boost ptime::from_time_t when possible. Unfortunately only a range of input times are supported without overflowing the underlying time_duration struct (dates between 1677-2262), so those dates outside that range are handled with the slow path calling into gmtime_r.
This requires using a patched build of boost which includes an upstream fix for the time_duration class, see: https://github.com/cloudera/native-toolchain/commit/6e726b4b8164d53814f75b78a681fcf6df4a887a Testing: * Ran an exhaustive test run successfully. * Wrote a test program to validate that all time_t values between 1677 and 2262 are converted to the same ptime when using the new code path and the old path. Doing so required running all night, so I'm not going to attempt to add this test. I think a single validation is acceptable. Perf impact: Microbenchmark of the new path (conversion using boost) and the old path (conversion using libc gmtime_r) resulted in the expected speedup from not using a global lock. Machine Info: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz Function 10%ile 50%ile 90%ile 10%ile 50%ile 90%ile iters/ms iters/ms iters/ms (relative) (relative) (relative) --------------------------------------------------------------------- libc-1 0.192 0.196 0.2 1X 1X 1X boost-1 0.333 0.34 0.34 1.73X 1.73X 1.7X libc-2 0.112 0.115 0.116 0.584X 0.586X 0.581X boost-2 0.627 0.654 0.667 3.26X 3.33X 3.33X libc-4 0.042 0.0478 0.0515 0.218X 0.244X 0.258X boost-4 0.863 1.27 1.3 4.49X 6.5X 6.5X libc-8 0.0326 0.0328 0.0329 0.169X 0.167X 0.164X boost-8 0.741 0.902 1.1 3.85X 4.6X 5.5X Change-Id: I9d611d21310c7b4f93d8e0bc845eb85125abcac9 Reviewed-on: http://gerrit.cloudera.org:8080/7082 Reviewed-by: Matthew Jacobs <[email protected]> Tested-by: Impala Public Jenkins Project: http://git-wip-us.apache.org/repos/asf/incubator-impala/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-impala/commit/dad48966 Tree: http://git-wip-us.apache.org/repos/asf/incubator-impala/tree/dad48966 Diff: http://git-wip-us.apache.org/repos/asf/incubator-impala/diff/dad48966 Branch: refs/heads/master Commit: dad4896693c99f6a59be5c25bed2297a1ed57c5f Parents: 621af4b Author: Matthew Jacobs <[email protected]> Authored: Mon Jun 5 12:28:43 2017 -0700 Committer: Impala Public Jenkins <[email protected]> Committed: Tue Jun 6 20:33:50 2017 +0000 ---------------------------------------------------------------------- be/src/runtime/timestamp-test.cc | 14 ++++++++ be/src/runtime/timestamp-value.cc | 61 ++++++++++++++++++++++++++++------ 2 files changed, 64 insertions(+), 11 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/dad48966/be/src/runtime/timestamp-test.cc ---------------------------------------------------------------------- diff --git a/be/src/runtime/timestamp-test.cc b/be/src/runtime/timestamp-test.cc index 32fc60f..c18313b 100644 --- a/be/src/runtime/timestamp-test.cc +++ b/be/src/runtime/timestamp-test.cc @@ -667,6 +667,20 @@ TEST(TimestampTest, Basic) { EXPECT_EQ("2038-01-19 03:14:09", TimestampValue::FromUnixTime(2147483649).ToString()); + // Test FromUnixTime around the boundary of the values that are converted via boost via + // gmtime (IMPALA-5357). Tests 1 second before and after the values supported by the + // boost conversion logic. + const int64_t MIN_BOOST_CONVERT_UNIX_TIME = -9223372036; + const int64_t MAX_BOOST_CONVERT_UNIX_TIME = 9223372036; + EXPECT_EQ("1677-09-21 00:12:43", + TimestampValue::FromUnixTime(MIN_BOOST_CONVERT_UNIX_TIME - 1).ToString()); + EXPECT_EQ("1677-09-21 00:12:44", + TimestampValue::FromUnixTime(MIN_BOOST_CONVERT_UNIX_TIME).ToString()); + EXPECT_EQ("2262-04-11 23:47:16", + TimestampValue::FromUnixTime(MAX_BOOST_CONVERT_UNIX_TIME).ToString()); + EXPECT_EQ("2262-04-11 23:47:17", + TimestampValue::FromUnixTime(MAX_BOOST_CONVERT_UNIX_TIME + 1).ToString()); + // Test a leap second in 1998 represented by the UTC time 1998-12-31 23:59:60. // Unix time cannot represent the leap second, which repeats 915148800. EXPECT_EQ("1998-12-31 23:59:59", http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/dad48966/be/src/runtime/timestamp-value.cc ---------------------------------------------------------------------- diff --git a/be/src/runtime/timestamp-value.cc b/be/src/runtime/timestamp-value.cc index 32252a0..56dfbda 100644 --- a/be/src/runtime/timestamp-value.cc +++ b/be/src/runtime/timestamp-value.cc @@ -26,6 +26,7 @@ using boost::date_time::not_a_date_time; using boost::gregorian::date; using boost::gregorian::date_duration; +using boost::posix_time::from_time_t; using boost::posix_time::nanoseconds; using boost::posix_time::ptime; using boost::posix_time::ptime_from_tm; @@ -111,21 +112,51 @@ ostream& operator<<(ostream& os, const TimestampValue& timestamp_value) { return os << timestamp_value.ToString(); } -ptime TimestampValue::UnixTimeToPtime(time_t unix_time) { - /// Unix times are represented internally in boost as 32 bit ints which limits the - /// range of dates to 1901-2038 (https://svn.boost.org/trac/boost/ticket/3109), so - /// libc functions will be used instead. - // TODO: Conversion using libc is very expensive (IMPALA-5357); find an alternative. +/// Return a ptime representation of the given Unix time (seconds since the Unix epoch). +/// The time zone of the resulting ptime is local time. This is called by +/// UnixTimeToPtime. +inline ptime UnixTimeToLocalPtime(time_t unix_time) { tm temp_tm; - if (FLAGS_use_local_tz_for_unix_timestamp_conversions) { - if (UNLIKELY(localtime_r(&unix_time, &temp_tm) == nullptr)) { - return ptime(not_a_date_time); - } - } else { - if (UNLIKELY(gmtime_r(&unix_time, &temp_tm) == nullptr)) { + // TODO: avoid localtime*, which takes a global timezone db lock + if (UNLIKELY(localtime_r(&unix_time, &temp_tm) == nullptr)) { + return ptime(not_a_date_time); + } + try { + return ptime_from_tm(temp_tm); + } catch (std::exception&) { + return ptime(not_a_date_time); + } +} + +/// Return a ptime representation of the given Unix time (seconds since the Unix epoch). +/// The time zone of the resulting ptime is UTC. +/// In order to avoid a serious performance degredation using libc (IMPALA-5357), this +/// function uses boost to convert the time_t to a ptime. Unfortunately, because the boost +/// conversion relies on time_duration to represent the time_t and internally +/// time_duration stores nanosecond precision ticks, the 'fast path' conversion using +/// boost can only handle a limited range of dates (appx years 1677-2622, while Impala +/// supports years 1600-9999). For dates outside this range, the conversion will instead +/// use the libc function gmtime_r which supports those dates but takes the global lock +/// for the timezone db (even though technically it is not needed for the conversion, +/// again see IMPALA-5357). This is called by UnixTimeToPtime. +inline ptime UnixTimeToUtcPtime(time_t unix_time) { + // Minimum Unix time that can be converted with from_time_t: 1677-Sep-21 00:12:44 + const int64_t MIN_BOOST_CONVERT_UNIX_TIME = -9223372036; + // Maximum Unix time that can be converted with from_time_t: 2262-Apr-11 23:47:16 + const int64_t MAX_BOOST_CONVERT_UNIX_TIME = 9223372036; + if (LIKELY(unix_time >= MIN_BOOST_CONVERT_UNIX_TIME && + unix_time <= MAX_BOOST_CONVERT_UNIX_TIME)) { + try { + return from_time_t(unix_time); + } catch (std::exception&) { return ptime(not_a_date_time); } } + + tm temp_tm; + if (UNLIKELY(gmtime_r(&unix_time, &temp_tm) == nullptr)) { + return ptime(not_a_date_time); + } try { return ptime_from_tm(temp_tm); } catch (std::exception&) { @@ -133,6 +164,14 @@ ptime TimestampValue::UnixTimeToPtime(time_t unix_time) { } } +ptime TimestampValue::UnixTimeToPtime(time_t unix_time) { + if (FLAGS_use_local_tz_for_unix_timestamp_conversions) { + return UnixTimeToLocalPtime(unix_time); + } else { + return UnixTimeToUtcPtime(unix_time); + } +} + string TimestampValue::ToString() const { stringstream ss; if (HasDate()) {
