IMPALA-5357: Fix unixtime to UTC TimestampValue perf

Fixes the severe perf issue when calling gmtime_r to convert
a unix time because that libc function takes a global lock.
Instead, use boost ptime::from_time_t when possible.
Unfortunately only a range of input times are supported
without overflowing the underlying time_duration struct
(dates between 1677-2262), so those dates outside that range
are handled with the slow path calling into gmtime_r.

This requires using a patched build of boost which
includes an upstream fix for the time_duration class, see:
https://github.com/cloudera/native-toolchain/commit/6e726b4b8164d53814f75b78a681fcf6df4a887a

Testing:
* Ran an exhaustive test run successfully.
* Wrote a test program to validate that all time_t values
  between 1677 and 2262 are converted to the same ptime
  when using the new code path and the old path. Doing so
  required running all night, so I'm not going to attempt to
  add this test. I think a single validation is acceptable.

Perf impact:
Microbenchmark of the new path (conversion using boost) and
the old path (conversion using libc gmtime_r) resulted in
the expected speedup from not using a global lock.

Machine Info: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
Function    10%ile   50%ile   90%ile     10%ile     50%ile     90%ile
          iters/ms iters/ms iters/ms (relative) (relative) (relative)
---------------------------------------------------------------------
  libc-1     0.192    0.196      0.2         1X         1X         1X
 boost-1     0.333     0.34     0.34      1.73X      1.73X       1.7X
  libc-2     0.112    0.115    0.116     0.584X     0.586X     0.581X
 boost-2     0.627    0.654    0.667      3.26X      3.33X      3.33X
  libc-4     0.042   0.0478   0.0515     0.218X     0.244X     0.258X
 boost-4     0.863     1.27      1.3      4.49X       6.5X       6.5X
  libc-8    0.0326   0.0328   0.0329     0.169X     0.167X     0.164X
 boost-8     0.741    0.902      1.1      3.85X       4.6X       5.5X

Change-Id: I9d611d21310c7b4f93d8e0bc845eb85125abcac9
Reviewed-on: http://gerrit.cloudera.org:8080/7082
Reviewed-by: Matthew Jacobs <[email protected]>
Tested-by: Impala Public Jenkins


Project: http://git-wip-us.apache.org/repos/asf/incubator-impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-impala/commit/dad48966
Tree: http://git-wip-us.apache.org/repos/asf/incubator-impala/tree/dad48966
Diff: http://git-wip-us.apache.org/repos/asf/incubator-impala/diff/dad48966

Branch: refs/heads/master
Commit: dad4896693c99f6a59be5c25bed2297a1ed57c5f
Parents: 621af4b
Author: Matthew Jacobs <[email protected]>
Authored: Mon Jun 5 12:28:43 2017 -0700
Committer: Impala Public Jenkins <[email protected]>
Committed: Tue Jun 6 20:33:50 2017 +0000

----------------------------------------------------------------------
 be/src/runtime/timestamp-test.cc  | 14 ++++++++
 be/src/runtime/timestamp-value.cc | 61 ++++++++++++++++++++++++++++------
 2 files changed, 64 insertions(+), 11 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/dad48966/be/src/runtime/timestamp-test.cc
----------------------------------------------------------------------
diff --git a/be/src/runtime/timestamp-test.cc b/be/src/runtime/timestamp-test.cc
index 32fc60f..c18313b 100644
--- a/be/src/runtime/timestamp-test.cc
+++ b/be/src/runtime/timestamp-test.cc
@@ -667,6 +667,20 @@ TEST(TimestampTest, Basic) {
   EXPECT_EQ("2038-01-19 03:14:09",
       TimestampValue::FromUnixTime(2147483649).ToString());
 
+  // Test FromUnixTime around the boundary of the values that are converted 
via boost via
+  // gmtime (IMPALA-5357). Tests 1 second before and after the values 
supported by the
+  // boost conversion logic.
+  const int64_t MIN_BOOST_CONVERT_UNIX_TIME = -9223372036;
+  const int64_t MAX_BOOST_CONVERT_UNIX_TIME = 9223372036;
+  EXPECT_EQ("1677-09-21 00:12:43",
+      TimestampValue::FromUnixTime(MIN_BOOST_CONVERT_UNIX_TIME - 
1).ToString());
+  EXPECT_EQ("1677-09-21 00:12:44",
+      TimestampValue::FromUnixTime(MIN_BOOST_CONVERT_UNIX_TIME).ToString());
+  EXPECT_EQ("2262-04-11 23:47:16",
+      TimestampValue::FromUnixTime(MAX_BOOST_CONVERT_UNIX_TIME).ToString());
+  EXPECT_EQ("2262-04-11 23:47:17",
+      TimestampValue::FromUnixTime(MAX_BOOST_CONVERT_UNIX_TIME + 
1).ToString());
+
   // Test a leap second in 1998 represented by the UTC time 1998-12-31 
23:59:60.
   // Unix time cannot represent the leap second, which repeats 915148800.
   EXPECT_EQ("1998-12-31 23:59:59",

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/dad48966/be/src/runtime/timestamp-value.cc
----------------------------------------------------------------------
diff --git a/be/src/runtime/timestamp-value.cc 
b/be/src/runtime/timestamp-value.cc
index 32252a0..56dfbda 100644
--- a/be/src/runtime/timestamp-value.cc
+++ b/be/src/runtime/timestamp-value.cc
@@ -26,6 +26,7 @@
 using boost::date_time::not_a_date_time;
 using boost::gregorian::date;
 using boost::gregorian::date_duration;
+using boost::posix_time::from_time_t;
 using boost::posix_time::nanoseconds;
 using boost::posix_time::ptime;
 using boost::posix_time::ptime_from_tm;
@@ -111,21 +112,51 @@ ostream& operator<<(ostream& os, const TimestampValue& 
timestamp_value) {
   return os << timestamp_value.ToString();
 }
 
-ptime TimestampValue::UnixTimeToPtime(time_t unix_time) {
-  /// Unix times are represented internally in boost as 32 bit ints which 
limits the
-  /// range of dates to 1901-2038 
(https://svn.boost.org/trac/boost/ticket/3109), so
-  /// libc functions will be used instead.
-  // TODO: Conversion using libc is very expensive (IMPALA-5357); find an 
alternative.
+/// Return a ptime representation of the given Unix time (seconds since the 
Unix epoch).
+/// The time zone of the resulting ptime is local time. This is called by
+/// UnixTimeToPtime.
+inline ptime UnixTimeToLocalPtime(time_t unix_time) {
   tm temp_tm;
-  if (FLAGS_use_local_tz_for_unix_timestamp_conversions) {
-    if (UNLIKELY(localtime_r(&unix_time, &temp_tm) == nullptr)) {
-      return ptime(not_a_date_time);
-    }
-  } else {
-    if (UNLIKELY(gmtime_r(&unix_time, &temp_tm) == nullptr)) {
+  // TODO: avoid localtime*, which takes a global timezone db lock
+  if (UNLIKELY(localtime_r(&unix_time, &temp_tm) == nullptr)) {
+    return ptime(not_a_date_time);
+  }
+  try {
+    return ptime_from_tm(temp_tm);
+  } catch (std::exception&) {
+    return ptime(not_a_date_time);
+  }
+}
+
+/// Return a ptime representation of the given Unix time (seconds since the 
Unix epoch).
+/// The time zone of the resulting ptime is UTC.
+/// In order to avoid a serious performance degredation using libc 
(IMPALA-5357), this
+/// function uses boost to convert the time_t to a ptime. Unfortunately, 
because the boost
+/// conversion relies on time_duration to represent the time_t and internally
+/// time_duration stores nanosecond precision ticks, the 'fast path' 
conversion using
+/// boost can only handle a limited range of dates (appx years 1677-2622, 
while Impala
+/// supports years 1600-9999). For dates outside this range, the conversion 
will instead
+/// use the libc function gmtime_r which supports those dates but takes the 
global lock
+/// for the timezone db (even though technically it is not needed for the 
conversion,
+/// again see IMPALA-5357). This is called by UnixTimeToPtime.
+inline ptime UnixTimeToUtcPtime(time_t unix_time) {
+  // Minimum Unix time that can be converted with from_time_t: 1677-Sep-21 
00:12:44
+  const int64_t MIN_BOOST_CONVERT_UNIX_TIME = -9223372036;
+  // Maximum Unix time that can be converted with from_time_t: 2262-Apr-11 
23:47:16
+  const int64_t MAX_BOOST_CONVERT_UNIX_TIME = 9223372036;
+  if (LIKELY(unix_time >= MIN_BOOST_CONVERT_UNIX_TIME &&
+             unix_time <= MAX_BOOST_CONVERT_UNIX_TIME)) {
+    try {
+      return from_time_t(unix_time);
+    } catch (std::exception&) {
       return ptime(not_a_date_time);
     }
   }
+
+  tm temp_tm;
+  if (UNLIKELY(gmtime_r(&unix_time, &temp_tm) == nullptr)) {
+    return ptime(not_a_date_time);
+  }
   try {
     return ptime_from_tm(temp_tm);
   } catch (std::exception&) {
@@ -133,6 +164,14 @@ ptime TimestampValue::UnixTimeToPtime(time_t unix_time) {
   }
 }
 
+ptime TimestampValue::UnixTimeToPtime(time_t unix_time) {
+  if (FLAGS_use_local_tz_for_unix_timestamp_conversions) {
+    return UnixTimeToLocalPtime(unix_time);
+  } else {
+    return UnixTimeToUtcPtime(unix_time);
+  }
+}
+
 string TimestampValue::ToString() const {
   stringstream ss;
   if (HasDate()) {

Reply via email to