Repository: impala
Updated Branches:
  refs/heads/master 7793124b6 -> 2ee8caeb3


IMPALA-7521: Speed up sub-second unix time->TimestampValue conversions

Impala used to convert from sub-second unix time to TimestampValue
(which is split to date_ and time_ similarly to
boost::posix_time::ptime) by first splitting the input into seconds
and sub-seconds part, converting the seconds part wit
boost::posix_time::from_time_t(), and then adding the sub-seconds
part to this timestamp.

Different tricks are used to speed up different functions:
- UTC functions that expect a single integer as input can
  split it into date_ and time_ directly.
- Non-UTC functions need seconds for timezone conversion,
  because CCTZ expects time points as seconds. These
  were optimized by adding the subsecond part to time_
  instead of adding it to a ptime. This can be done safely
  because the sub-second part is between [0, 1 sec), so
  it cannot overflow into a different day or timezone.

Benchmarks show 2x - 6x speedup for the measured functions.

The main motivation is IMPALA-5050: "Add support to read
TIMESTAMP_MILLIS and TIMESTAMP_MICROS to the parquet
scanner" - reading these types will run
micro/milli->TimestampValue conversion for every row.

Other changes:
- TimestampValue::UtcFromUnixTimeMillis was added - currently this
  is only used in tests but it will be useful for IMPALA-5050
- Some functions were moved from .h to .inline.h.
- FromUnixTimeMicros was changed to do the utc->local conversion
  depending on flag use_local_tz_for_unix_timestamp_conversions
  to be consistent with other similar functions. This function was
  only used in tests until now but it will be useful for IMPALA-5050.
- When a result mismatch is detected in
  convert-timestamp-benchmark.cc it now prints non-equal values.
- Benchmarks were added for micro + nano conversions.
  Note that only single threaded benchmarks were added because I
  do not expect any difference in the multi threaded case.
- DCHECKs were added to TimeStampValue::Validate to
  ensure that time_ is between [0, 24 hour).

Testing:
- timestamp-test.cc was extended to give better coverage
  for sub-second conversions. Edge cases were already
  covered pretty well.

Change-Id: I572b5876b979ddae58165bd40d5b008ce9d7a4aa
Reviewed-on: http://gerrit.cloudera.org:8080/11183
Reviewed-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/2ee8caeb
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/2ee8caeb
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/2ee8caeb

Branch: refs/heads/master
Commit: 2ee8caeb3053dfa2c434c680ffb2ac756627ee38
Parents: 7793124
Author: Csaba Ringhofer <csringho...@cloudera.com>
Authored: Fri Aug 10 11:58:03 2018 +0200
Committer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Committed: Mon Sep 17 14:19:00 2018 +0000

----------------------------------------------------------------------
 .../benchmarks/convert-timestamp-benchmark.cc   | 183 ++++++++++++++++++-
 be/src/exec/data-source-scan-node.cc            |   2 +-
 be/src/exec/hdfs-orc-scanner.cc                 |   1 +
 be/src/exprs/decimal-operators-ir.cc            |   1 +
 be/src/exprs/expr-test.cc                       |   6 +-
 be/src/runtime/timestamp-test.cc                | 129 +++++++++++--
 be/src/runtime/timestamp-value.cc               |  43 +----
 be/src/runtime/timestamp-value.h                |  71 +++----
 be/src/runtime/timestamp-value.inline.h         |  50 ++++-
 9 files changed, 378 insertions(+), 108 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/2ee8caeb/be/src/benchmarks/convert-timestamp-benchmark.cc
----------------------------------------------------------------------
diff --git a/be/src/benchmarks/convert-timestamp-benchmark.cc 
b/be/src/benchmarks/convert-timestamp-benchmark.cc
index 235e3d9..857f13a 100644
--- a/be/src/benchmarks/convert-timestamp-benchmark.cc
+++ b/be/src/benchmarks/convert-timestamp-benchmark.cc
@@ -97,9 +97,28 @@ UnixTimeToLocalPtime:      Function  iters/ms   10%ile   
50%ile   90%ile     10%
 UnixTimeToUtcPtime:        Function  iters/ms   10%ile   50%ile   90%ile     
10%ile     50%ile     90%ile
                                                                          
(relative) (relative) (relative)
 
---------------------------------------------------------------------------------------------------------
-                            (glibc)               18.8     19.2     19.5       
  1X         1X         1X
-                      (Google/CCTZ)                  8     8.14     8.27     
0.425X     0.425X     0.424X
-                        (fast path)               29.5     30.2     30.8      
1.57X      1.58X      1.58X
+                            (glibc)                 17     17.6     17.9       
  1X         1X         1X
+                      (Google/CCTZ)               6.45     6.71     6.81     
0.379X     0.382X      0.38X
+                        (fast path)               25.1       26     26.4      
1.47X      1.48X      1.48X
+                        (day split)               48.6     50.3     51.3      
2.85X      2.87X      2.86X
+
+UtcFromUnixTimeMicros:     Function  iters/ms   10%ile   50%ile   90%ile     
10%ile     50%ile     90%ile
+                                                                         
(relative) (relative) (relative)
+---------------------------------------------------------------------------------------------------------
+                  (sec split (old))               17.9     18.7     19.1       
  1X         1X         1X
+                        (day split)                111      116      118      
6.21X      6.19X      6.19X
+
+FromUnixTimeNanos:         Function  iters/ms   10%ile   50%ile   90%ile     
10%ile     50%ile     90%ile
+                                                                         
(relative) (relative) (relative)
+---------------------------------------------------------------------------------------------------------
+                  (sec split (old))               18.7     19.5     19.8       
  1X         1X         1X
+                  (sec split (new))                104      108      110      
5.58X      5.55X      5.57X
+
+FromSubsecondUnixTime:     Function  iters/ms   10%ile   50%ile   90%ile     
10%ile     50%ile     90%ile
+                                                                         
(relative) (relative) (relative)
+---------------------------------------------------------------------------------------------------------
+                              (old)               18.7     18.7     18.7       
  1X         1X         1X
+                              (new)               73.5     74.1     74.1      
3.94X      3.96X      3.96X
 
 Number of threads: 8
 
@@ -232,14 +251,17 @@ void bail_if_results_dont_match(const vector<const 
vector<T>*>& test_result_vec)
 
   auto b = test_result_vec.begin();
   for (auto it = b + 1; it != test_result_vec.end(); ++it) {
-    if (**b != **it) {
-      cerr << "Results don't match.";
-      exit(1);
+    auto& references = **b;
+    auto& results = **it;
+    for (int i = 0; i < references.size(); ++i) {
+      if (references[i] != results[i]) {
+        cerr << "Results don't match: " << references[i] << " vs " << 
results[i] << endl;
+        exit(1);
+      }
     }
   }
 }
 
-
 //
 // Test UtcToUnixTime (boost is expected to be the fastest, followed by CCTZ 
and glibc)
 //
@@ -536,6 +558,68 @@ boost::posix_time::ptime 
cctz_optimized_unix_time_to_utc_ptime(const time_t& uni
   return cctz_unix_time_to_utc_ptime(unix_time);
 }
 
+boost::posix_time::ptime split_unix_time_to_utc_ptime(const time_t& unix_time) 
{
+  int64_t time = unix_time;
+  int32_t days = TimestampValue::SplitTime<24*60*60>(&time);
+
+  return boost::posix_time::ptime(
+      boost::gregorian::date(1970, 1, 1) + 
boost::gregorian::date_duration(days),
+      boost::posix_time::nanoseconds(time*NANOS_PER_SEC));
+}
+
+TimestampValue sec_split_utc_from_unix_time_micros(const int64_t& 
unix_time_micros) {
+  int64_t ts_seconds = unix_time_micros / MICROS_PER_SEC;
+  int64_t micros_part = unix_time_micros - (ts_seconds * MICROS_PER_SEC);
+  boost::posix_time::ptime temp = 
cctz_optimized_unix_time_to_utc_ptime(ts_seconds);
+  temp += boost::posix_time::microseconds(micros_part);
+  return TimestampValue(temp);
+}
+
+TimestampValue day_split_utc_from_unix_time_micros(const int64_t& 
unix_time_micros) {
+  static const boost::gregorian::date EPOCH(1970,1,1);
+  int64_t micros = unix_time_micros;
+  int32_t days = TimestampValue::SplitTime<24LL*60*60*MICROS_PER_SEC>(&micros);
+
+  return TimestampValue(
+      EPOCH + boost::gregorian::date_duration(days),
+      boost::posix_time::nanoseconds(micros*1000));
+}
+
+struct SplitNanoAndSecond {
+  int64_t seconds;
+  int64_t nanos;
+};
+
+TimestampValue old_split_utc_from_unix_time_nanos(const SplitNanoAndSecond& 
unix_time) {
+  boost::posix_time::ptime temp =
+      cctz_optimized_unix_time_to_utc_ptime(unix_time.seconds);
+  temp += boost::posix_time::nanoseconds(unix_time.nanos);
+  return TimestampValue(temp);
+}
+
+TimestampValue new_split_utc_from_unix_time_nanos(const SplitNanoAndSecond& 
unix_time) {
+  // The TimestampValue version is used as it is hard to reproduce the same 
logic without
+  // accessing private members.
+  return TimestampValue::FromUnixTimeNanos(unix_time.seconds, unix_time.nanos,
+      TimezoneDatabase::GetUtcTimezone());
+}
+
+TimestampValue from_subsecond_unix_time_old(const double& unix_time) {
+  const double ONE_BILLIONTH = 0.000000001;
+  const time_t unix_time_whole = unix_time;
+  boost::posix_time::ptime temp =
+      cctz_optimized_unix_time_to_utc_ptime(unix_time_whole);
+  temp += boost::posix_time::nanoseconds((unix_time - unix_time_whole) / 
ONE_BILLIONTH);
+  return TimestampValue(temp);
+}
+
+TimestampValue from_subsecond_unix_time_new(const double& unix_time) {
+  const double ONE_BILLIONTH = 0.000000001;
+  int64_t unix_time_whole = unix_time;
+  int64_t nanos = (unix_time - unix_time_whole) / ONE_BILLIONTH;
+  return TimestampValue::FromUnixTimeNanos(
+      unix_time_whole, nanos, TimezoneDatabase::GetUtcTimezone());
+}
 
 //
 // Test ToUtc (CCTZ is expected to be faster than boost)
@@ -745,6 +829,10 @@ int main(int argc, char* argv[]) {
       time_t,
       boost::posix_time::ptime,
       fastpath_unix_time_to_utc_ptime> fastpath_unix_time_to_utc_ptime_data = 
time_data;
+  TestData<
+      time_t,
+      boost::posix_time::ptime,
+      split_unix_time_to_utc_ptime> split_unix_time_to_utc_ptime_data = 
time_data;
 
   glibc_unix_time_to_utc_ptime_data.add_to_benchmark(bm_unix_time_to_utc_ptime,
       "(glibc)");
@@ -752,12 +840,91 @@ int main(int argc, char* argv[]) {
       "(Google/CCTZ)");
   
fastpath_unix_time_to_utc_ptime_data.add_to_benchmark(bm_unix_time_to_utc_ptime,
       "(fast path)");
+  split_unix_time_to_utc_ptime_data.add_to_benchmark(bm_unix_time_to_utc_ptime,
+      "(day split)");
   cout << bm_unix_time_to_utc_ptime.Measure() << endl;
 
   bail_if_results_dont_match(vector<const vector<boost::posix_time::ptime>*>{
       &glibc_unix_time_to_utc_ptime_data.result(),
       &cctz_unix_time_to_utc_ptime_data.result(),
-      &fastpath_unix_time_to_utc_ptime_data.result()});
+      &fastpath_unix_time_to_utc_ptime_data.result(),
+      &split_unix_time_to_utc_ptime_data.result()});
+
+  // Benchmark UtcFromUnixTimeMicros improvement in IMPALA-7417.
+  vector<time_t> microsec_data;
+  for (int i = 0; i < tsvalue_data.size(); ++i) {
+    const TimestampValue& tsvalue = tsvalue_data[i];
+    time_t unix_time;
+    tsvalue.ToUnixTime(TimezoneDatabase::GetUtcTimezone(), &unix_time);
+    int micros = (i * 1001) % MICROS_PER_SEC; // add some sub-second part
+    microsec_data.push_back(unix_time * MICROS_PER_SEC + micros);
+  }
+
+  Benchmark bm_utc_from_unix_time_micros("UtcFromUnixTimeMicros");
+  TestData<int64_t, TimestampValue, sec_split_utc_from_unix_time_micros>
+      sec_split_utc_from_unix_time_micros_data = microsec_data;
+  TestData<int64_t, TimestampValue, day_split_utc_from_unix_time_micros>
+      day_split_utc_from_unix_time_micros_data = microsec_data;
+
+  
sec_split_utc_from_unix_time_micros_data.add_to_benchmark(bm_utc_from_unix_time_micros,
+      "(sec split (old))");
+  
day_split_utc_from_unix_time_micros_data.add_to_benchmark(bm_utc_from_unix_time_micros,
+      "(day split)");
+  cout << bm_utc_from_unix_time_micros.Measure() << endl;
+
+  bail_if_results_dont_match(vector<const vector<TimestampValue>*>{
+      &sec_split_utc_from_unix_time_micros_data.result(),
+      &day_split_utc_from_unix_time_micros_data.result()});
+
+  // Benchmark FromUnixTimeNanos improvement in IMPALA-7417.
+  vector<SplitNanoAndSecond> nanosec_data;
+  for (int i = 0; i < tsvalue_data.size(); ++i) {
+    const TimestampValue& tsvalue = tsvalue_data[i];
+    time_t unix_time;
+    tsvalue.ToUnixTime(TimezoneDatabase::GetUtcTimezone(), &unix_time);
+    int nanos = (i * 1001) % NANOS_PER_SEC; // add some sub-second part
+    nanosec_data.push_back(SplitNanoAndSecond {unix_time, nanos} );
+  }
+
+  Benchmark bm_utc_from_unix_time_nanos("FromUnixTimeNanos");
+  TestData<SplitNanoAndSecond, TimestampValue, 
old_split_utc_from_unix_time_nanos>
+      old_split_utc_from_unix_time_nanos_data = nanosec_data;
+  TestData<SplitNanoAndSecond, TimestampValue, 
new_split_utc_from_unix_time_nanos>
+      new_split_utc_from_unix_time_nanos_data = nanosec_data;
+
+  
old_split_utc_from_unix_time_nanos_data.add_to_benchmark(bm_utc_from_unix_time_nanos,
+      "(sec split (old))");
+  
new_split_utc_from_unix_time_nanos_data.add_to_benchmark(bm_utc_from_unix_time_nanos,
+      "(sec split (new))");
+  cout << bm_utc_from_unix_time_nanos.Measure() << endl;
+
+  bail_if_results_dont_match(vector<const vector<TimestampValue>*>{
+      &old_split_utc_from_unix_time_nanos_data.result(),
+      &new_split_utc_from_unix_time_nanos_data.result()});
+
+  // Benchmark FromSubsecondUnixTime before and after IMPALA-7417.
+  vector<double> double_data;
+  for (int i = 0; i < tsvalue_data.size(); ++i) {
+    const TimestampValue& tsvalue = tsvalue_data[i];
+    time_t unix_time;
+    tsvalue.ToUnixTime(TimezoneDatabase::GetUtcTimezone(), &unix_time);
+    double nanos = (i * 1001) % NANOS_PER_SEC; // add some sub-second part
+    double_data.push_back((double)unix_time + nanos / NANOS_PER_SEC);
+  }
+
+  Benchmark from_subsecond_unix_time("FromSubsecondUnixTime");
+  TestData<double, TimestampValue, from_subsecond_unix_time_old>
+      from_subsecond_unix_time_old_data = double_data;
+  TestData<double, TimestampValue, from_subsecond_unix_time_new>
+      from_subsecond_unix_time_new_data = double_data;
+
+  from_subsecond_unix_time_old_data.add_to_benchmark(from_subsecond_unix_time, 
"(old)");
+  from_subsecond_unix_time_new_data.add_to_benchmark(from_subsecond_unix_time, 
"(new)");
+  cout << from_subsecond_unix_time.Measure() << endl;
+
+  bail_if_results_dont_match(vector<const vector<TimestampValue>*>{
+      &from_subsecond_unix_time_old_data.result(),
+      &from_subsecond_unix_time_new_data.result()});
 
   // If number of threads is specified, run multithreaded tests.
   int num_of_threads = (argc < 2) ? 0 : atoi(argv[1]);

http://git-wip-us.apache.org/repos/asf/impala/blob/2ee8caeb/be/src/exec/data-source-scan-node.cc
----------------------------------------------------------------------
diff --git a/be/src/exec/data-source-scan-node.cc 
b/be/src/exec/data-source-scan-node.cc
index fa11d05..10f0c7d 100644
--- a/be/src/exec/data-source-scan-node.cc
+++ b/be/src/exec/data-source-scan-node.cc
@@ -29,7 +29,7 @@
 #include "runtime/runtime-state.h"
 #include "runtime/row-batch.h"
 #include "runtime/string-value.h"
-#include "runtime/timestamp-value.h"
+#include "runtime/timestamp-value.inline.h"
 #include "runtime/tuple-row.h"
 #include "util/jni-util.h"
 #include "util/periodic-counter-updater.h"

http://git-wip-us.apache.org/repos/asf/impala/blob/2ee8caeb/be/src/exec/hdfs-orc-scanner.cc
----------------------------------------------------------------------
diff --git a/be/src/exec/hdfs-orc-scanner.cc b/be/src/exec/hdfs-orc-scanner.cc
index bbe9b12..9979688 100644
--- a/be/src/exec/hdfs-orc-scanner.cc
+++ b/be/src/exec/hdfs-orc-scanner.cc
@@ -24,6 +24,7 @@
 #include "runtime/exec-env.h"
 #include "runtime/io/request-context.h"
 #include "runtime/runtime-filter.inline.h"
+#include "runtime/timestamp-value.inline.h"
 #include "runtime/tuple-row.h"
 #include "util/decompress.h"
 

http://git-wip-us.apache.org/repos/asf/impala/blob/2ee8caeb/be/src/exprs/decimal-operators-ir.cc
----------------------------------------------------------------------
diff --git a/be/src/exprs/decimal-operators-ir.cc 
b/be/src/exprs/decimal-operators-ir.cc
index 0a41f97..c987b9d 100644
--- a/be/src/exprs/decimal-operators-ir.cc
+++ b/be/src/exprs/decimal-operators-ir.cc
@@ -26,6 +26,7 @@
 #include "exprs/scalar-expr.h"
 #include "exprs/timezone_db.h"
 #include "runtime/decimal-value.inline.h"
+#include "runtime/timestamp-value.inline.h"
 #include "util/decimal-util.h"
 #include "util/string-parser.h"
 

http://git-wip-us.apache.org/repos/asf/impala/blob/2ee8caeb/be/src/exprs/expr-test.cc
----------------------------------------------------------------------
diff --git a/be/src/exprs/expr-test.cc b/be/src/exprs/expr-test.cc
index e081271..dd520ba 100644
--- a/be/src/exprs/expr-test.cc
+++ b/be/src/exprs/expr-test.cc
@@ -170,13 +170,15 @@ class ScopedTimeZoneOverride {
 
 // Enable FLAGS_use_local_tz_for_unix_timestamp_conversions for the duration 
of the scope.
 class ScopedLocalUnixTimestampConversionOverride {
+ bool original_;
  public:
   ScopedLocalUnixTimestampConversionOverride() {
+    original_ = FLAGS_use_local_tz_for_unix_timestamp_conversions;
     FLAGS_use_local_tz_for_unix_timestamp_conversions = true;
   }
 
   ~ScopedLocalUnixTimestampConversionOverride() {
-    FLAGS_use_local_tz_for_unix_timestamp_conversions = false;
+    FLAGS_use_local_tz_for_unix_timestamp_conversions = original_;
   }
 };
 
@@ -6520,6 +6522,7 @@ TEST_F(ExprTest, TimestampFunctions) {
   // Test that now() and current_timestamp() are reasonable.
   {
     ScopedTimeZoneOverride time_zone("PST8PDT");
+    ScopedLocalUnixTimestampConversionOverride use_local;
     const Timezone& local_tz = time_zone.GetTimezone();
 
     const TimestampValue start_time =
@@ -6545,6 +6548,7 @@ TEST_F(ExprTest, TimestampFunctions) {
   // Test cast(unix_timestamp() as timestamp).
   {
     ScopedTimeZoneOverride time_zone("PST8PDT");
+    ScopedLocalUnixTimestampConversionOverride use_local;
     const Timezone& local_tz = time_zone.GetTimezone();
 
     // UNIX_TIMESTAMP() has second precision so the comparison start time is 
shifted back

http://git-wip-us.apache.org/repos/asf/impala/blob/2ee8caeb/be/src/runtime/timestamp-test.cc
----------------------------------------------------------------------
diff --git a/be/src/runtime/timestamp-test.cc b/be/src/runtime/timestamp-test.cc
index 8afa957..97ceba9 100644
--- a/be/src/runtime/timestamp-test.cc
+++ b/be/src/runtime/timestamp-test.cc
@@ -230,6 +230,66 @@ void TestTimestampTokens(vector<TimestampToken>* toks, int 
year, int month,
   } while (next_permutation(toks->begin(), toks->end()));
 }
 
+// Checks that FromSubsecondUnixTime gives the correct result. Conversion to 
double
+// is lossy so the result is expected to be within a certain range of 
'expected'.
+// The fraction part of 'nanos' can express sub-nanoseconds - the current 
logic is to
+// truncate to the next whole nanosecond (towards 0).
+void TestFromDoubleUnixTime(
+    int64_t seconds, int64_t millis, double nanos, const TimestampValue& 
expected) {
+  auto tz = TimezoneDatabase::GetUtcTimezone();
+  TimestampValue from_double = TimestampValue::FromSubsecondUnixTime(
+      1.0 * seconds + 0.001 * millis + nanos / NANOS_PER_SEC, tz);
+
+  if (!expected.HasDate()) EXPECT_FALSE(from_double.HasDate());
+  else {
+    // Conversion to double is lossy so the timestamp can be a bit different.
+    int64_t expected_rounded_to_micros, from_double_rounded_to_micros;
+    EXPECT_TRUE(expected.UtcToUnixTimeMicros(&expected_rounded_to_micros));
+    
EXPECT_TRUE(from_double.UtcToUnixTimeMicros(&from_double_rounded_to_micros));
+    // The difference can be more than a microsec in case of timestamps far 
from 1970.
+    int64_t MARGIN_OF_ERROR = 8;
+    EXPECT_LT(abs(expected_rounded_to_micros - from_double_rounded_to_micros),
+        MARGIN_OF_ERROR);
+  }
+}
+
+// Checks that all sub-second From*UnixTime gives the same result and that the 
result
+// is the same as 'expected'.
+// If 'expected' is nullptr then the result is expected to be invalid (out of 
range).
+void TestFromSubSecondFunctions(int64_t seconds, int64_t millis, const char* 
expected) {
+  auto tz = TimezoneDatabase::GetUtcTimezone();
+
+  TimestampValue from_millis =
+      TimestampValue::UtcFromUnixTimeMillis(seconds * 1000 + millis);
+  if (expected == nullptr) EXPECT_FALSE(from_millis.HasDate());
+  else EXPECT_EQ(expected, from_millis.ToString());
+
+  EXPECT_EQ(from_millis, TimestampValue::UtcFromUnixTimeMicros(
+      seconds * MICROS_PER_SEC + millis * 1000));
+  EXPECT_EQ(from_millis, TimestampValue::FromUnixTimeMicros(
+      seconds * MICROS_PER_SEC + millis * 1000, tz));
+
+  // Check the same timestamp shifted with some sub-nanosecs.
+  vector<double> sub_nanosec_offsets =
+      {0.0, 0.1, 0.9, 0.000001, 0.999999, 2.2250738585072020e-308};
+  for (double sub_nanos: sub_nanosec_offsets) {
+    TestFromDoubleUnixTime(seconds, millis, sub_nanos, from_millis);
+    TestFromDoubleUnixTime(seconds, millis, -sub_nanos, from_millis);
+  }
+
+  // Test FromUnixTimeNanos with shifted sec + subsec pairs.
+  vector<int64_t> signs = {-1, 1};
+  vector<int64_t> offsets = {0, 1, 2, 60, 60*60, 24*60*60};
+  for (int64_t sign: signs) {
+    for (int64_t offset: offsets) {
+      int64_t shifted_seconds = seconds + sign * offset;
+      int64_t shifted_nanos = (millis - 1000 * sign * offset) * 1000 * 1000;
+      EXPECT_EQ(from_millis,
+          TimestampValue::FromUnixTimeNanos(shifted_seconds, shifted_nanos, 
tz));
+    }
+  }
+}
+
 TEST(TimestampTest, Basic) {
   // Fix current time to determine the behavior parsing 2-digit year format
   // Set it to 03/01 to test 02/29 edge cases.
@@ -668,12 +728,9 @@ TEST(TimestampTest, Basic) {
   // Sub-second FromUnixTime functions incorrectly accepted the last second of 
1399
   // as valid, because validation logic checked the nearest second rounded 
towards 0
   // (IMPALA-5664).
-  EXPECT_FALSE(TimestampValue::FromSubsecondUnixTime(
-      MIN_DATE_AS_UNIX_TIME - 0.1, utc_tz).HasDate());
-  EXPECT_FALSE(TimestampValue::UtcFromUnixTimeMicros(
-      MIN_DATE_AS_UNIX_TIME * MICROS_PER_SEC - 2000).HasDate());
-  EXPECT_FALSE(TimestampValue::FromUnixTimeNanos(
-      MIN_DATE_AS_UNIX_TIME, -NANOS_PER_MICRO * 100, utc_tz).HasDate());
+  TestFromSubSecondFunctions(MIN_DATE_AS_UNIX_TIME, -100, nullptr);
+  TestFromSubSecondFunctions(MIN_DATE_AS_UNIX_TIME, 100,
+      "1400-01-01 00:00:00.100000000");
 
   // Test the max supported date that can be represented in seconds.
   const int64_t MAX_DATE_AS_UNIX_TIME = 253402300799;
@@ -716,20 +773,9 @@ TEST(TimestampTest, Basic) {
   EXPECT_FALSE(too_late.HasTime());
 
   // Checking sub-second FromUnixTime functions near 10000.01.01
-  EXPECT_TRUE(TimestampValue::FromSubsecondUnixTime(
-      MAX_DATE_AS_UNIX_TIME + 0.99, utc_tz).HasDate());
-  EXPECT_FALSE(TimestampValue::FromSubsecondUnixTime(
-      MAX_DATE_AS_UNIX_TIME + 1, utc_tz).HasDate());
-
-  EXPECT_TRUE(TimestampValue::UtcFromUnixTimeMicros(
-      MAX_DATE_AS_UNIX_TIME * MICROS_PER_SEC + MICROS_PER_SEC - 1).HasDate());
-  EXPECT_FALSE(TimestampValue::UtcFromUnixTimeMicros(
-      MAX_DATE_AS_UNIX_TIME * MICROS_PER_SEC + MICROS_PER_SEC).HasDate());
-
-  EXPECT_TRUE(TimestampValue::FromUnixTimeNanos(
-      MAX_DATE_AS_UNIX_TIME, NANOS_PER_SEC - 1, utc_tz).HasDate());
-  EXPECT_FALSE(TimestampValue::FromUnixTimeNanos(
-      MAX_DATE_AS_UNIX_TIME, NANOS_PER_SEC, utc_tz).HasDate());
+  TestFromSubSecondFunctions(MAX_DATE_AS_UNIX_TIME, MILLIS_PER_SEC - 1,
+      "9999-12-31 23:59:59.999000000");
+  TestFromSubSecondFunctions(MAX_DATE_AS_UNIX_TIME, MILLIS_PER_SEC, nullptr);
 
   // Regression tests for IMPALA-1676, Unix times overflow int32 during year 
2038
   EXPECT_EQ("2038-01-19 03:14:08",
@@ -804,6 +850,49 @@ TEST(TimestampTest, Basic) {
       TimestampValue::FromSubsecondUnixTime(0.008, utc_tz).ToString());
 }
 
+// Test subsecond unix time conversion for non edge cases.
+TEST(TimestampTest, SubSecond) {
+  // Test with millisec precision.
+  TestFromSubSecondFunctions(0, 0, "1970-01-01 00:00:00");
+  TestFromSubSecondFunctions(0, 100, "1970-01-01 00:00:00.100000000");
+  TestFromSubSecondFunctions(0, 1100, "1970-01-01 00:00:01.100000000");
+  TestFromSubSecondFunctions(0, 24*60*60*1000, "1970-01-02 00:00:00");
+  TestFromSubSecondFunctions(0, 2*24*60*60*1000 + 100, "1970-01-03 
00:00:00.100000000");
+
+  TestFromSubSecondFunctions(0, -100, "1969-12-31 23:59:59.900000000");
+  TestFromSubSecondFunctions(0, -1100, "1969-12-31 23:59:58.900000000");
+  TestFromSubSecondFunctions(0, -24*60*60*1000, "1969-12-31 00:00:00");
+  TestFromSubSecondFunctions(0, -2*24*60*60*1000 + 100, "1969-12-30 
00:00:00.100000000");
+
+  TestFromSubSecondFunctions(-1, 0, "1969-12-31 23:59:59");
+  TestFromSubSecondFunctions(-1, 100, "1969-12-31 23:59:59.100000000");
+  TestFromSubSecondFunctions(-1, 1100, "1970-01-01 00:00:00.100000000");
+  TestFromSubSecondFunctions(-1, 24*60*60*1000, "1970-01-01 23:59:59");
+  TestFromSubSecondFunctions(-1, 2*24*60*60*1000 + 100, "1970-01-02 
23:59:59.100000000");
+
+  TestFromSubSecondFunctions(-1, -100, "1969-12-31 23:59:58.900000000");
+  TestFromSubSecondFunctions(-1, -1100, "1969-12-31 23:59:57.900000000");
+  TestFromSubSecondFunctions(-1, -24*60*60*1000, "1969-12-30 23:59:59");
+  TestFromSubSecondFunctions(-1, -2*24*60*60*1000 + 100, "1969-12-29 
23:59:59.100000000");
+
+  // A few test with sub-millisec precision.
+  auto tz = TimezoneDatabase::GetUtcTimezone();
+  EXPECT_EQ("1970-01-01 00:00:00.000001000",
+      TimestampValue::UtcFromUnixTimeMicros(1).ToString());
+  EXPECT_EQ("1969-12-31 23:59:59.999999000",
+      TimestampValue::UtcFromUnixTimeMicros(-1).ToString());
+
+  EXPECT_EQ("1970-01-01 00:00:00.000001000",
+      TimestampValue::FromUnixTimeMicros(1, tz).ToString());
+  EXPECT_EQ("1969-12-31 23:59:59.999999000",
+      TimestampValue::FromUnixTimeMicros(-1, tz).ToString());
+
+  EXPECT_EQ("1970-01-01 00:00:00.000000001",
+      TimestampValue::FromUnixTimeNanos(0, 1, tz).ToString());
+  EXPECT_EQ("1969-12-31 23:59:59.999999999",
+      TimestampValue::FromUnixTimeNanos(0, -1, tz).ToString());
+}
+
 }
 
 IMPALA_TEST_MAIN();

http://git-wip-us.apache.org/repos/asf/impala/blob/2ee8caeb/be/src/runtime/timestamp-value.cc
----------------------------------------------------------------------
diff --git a/be/src/runtime/timestamp-value.cc 
b/be/src/runtime/timestamp-value.cc
index 50d1ee4..8967174 100644
--- a/be/src/runtime/timestamp-value.cc
+++ b/be/src/runtime/timestamp-value.cc
@@ -140,7 +140,7 @@ void TimestampValue::LocalToUtc(const Timezone& local_tz) {
     SetToInvalidDateTime();
   } else {
     int64_t nanos = time_.fractional_seconds();
-    *this = UnixTimeToUtcPtime(TimePointToUnixTime(from_cl.pre));
+    *this = UtcFromUnixTimeTicks<1>(TimePointToUnixTime(from_cl.pre));
     // Time-zone conversion rules don't affect fractional seconds, leave them 
intact.
     time_ += nanoseconds(nanos);
   }
@@ -150,10 +150,8 @@ ostream& operator<<(ostream& os, const TimestampValue& 
timestamp_value) {
   return os << timestamp_value.ToString();
 }
 
-/// Return a ptime representation of the given Unix time (seconds since the 
Unix epoch).
-/// The time zone of the resulting ptime is 'local_tz'. This is called by 
UnixTimeToPtime.
-ptime TimestampValue::UnixTimeToLocalPtime(time_t unix_time,
-    const Timezone& local_tz) {
+TimestampValue TimestampValue::UnixTimeToLocal(
+    time_t unix_time, const Timezone& local_tz) {
   cctz::time_point<cctz::sys_seconds> from_tp = UnixTimeToTimePoint(unix_time);
   cctz::civil_second to_cs = cctz::convert(from_tp, local_tz);
   // boost::gregorian::date() throws boost::gregorian::bad_year if year is not 
in the
@@ -161,44 +159,17 @@ ptime TimestampValue::UnixTimeToLocalPtime(time_t 
unix_time,
   if (UNLIKELY(IsDateOutOfRange(to_cs))) {
     return ptime(not_a_date_time);
   } else {
-    return ptime(
+    return TimestampValue(
         boost::gregorian::date(to_cs.year(), to_cs.month(), to_cs.day()),
         boost::posix_time::time_duration(to_cs.hour(), to_cs.minute(), 
to_cs.second()));
   }
 }
 
-/// Return a ptime representation of the given Unix time (seconds since the 
Unix epoch).
-/// The time zone of the resulting ptime is UTC.
-/// In order to avoid a serious performance degredation using CCTZ, this 
function uses
-/// boost to convert the time_t to a ptime. Unfortunately, because the boost 
conversion
-/// relies on time_duration to represent the time_t and internally
-/// time_duration stores nanosecond precision ticks, the 'fast path' 
conversion using
-/// boost can only handle a limited range of dates (appx years 1677-2622, 
while Impala
-/// supports years 1400-9999). For dates outside this range, the conversion 
will instead
-/// use the CCTZ function convert which supports those dates. This is called by
-/// UnixTimeToPtime.
-ptime TimestampValue::UnixTimeToUtcPtime(time_t unix_time) {
-  // Minimum Unix time that can be converted with from_time_t: 1677-Sep-21 
00:12:44
-  const int64_t MIN_BOOST_CONVERT_UNIX_TIME = -9223372036;
-  // Maximum Unix time that can be converted with from_time_t: 2262-Apr-11 
23:47:16
-  const int64_t MAX_BOOST_CONVERT_UNIX_TIME = 9223372036;
-  if (LIKELY(unix_time >= MIN_BOOST_CONVERT_UNIX_TIME &&
-             unix_time <= MAX_BOOST_CONVERT_UNIX_TIME)) {
-    try {
-      return from_time_t(unix_time);
-    } catch (std::exception&) {
-      return ptime(not_a_date_time);
-    }
-  }
-
-  return UnixTimeToLocalPtime(unix_time, TimezoneDatabase::GetUtcTimezone());
-}
-
-ptime TimestampValue::UnixTimeToPtime(time_t unix_time, const Timezone& 
local_tz) {
+TimestampValue TimestampValue::FromUnixTime(time_t unix_time, const Timezone& 
local_tz) {
   if (FLAGS_use_local_tz_for_unix_timestamp_conversions) {
-    return UnixTimeToLocalPtime(unix_time, local_tz);
+    return UnixTimeToLocal(unix_time, local_tz);
   } else {
-    return UnixTimeToUtcPtime(unix_time);
+    return UtcFromUnixTimeTicks<1>(unix_time);
   }
 }
 

http://git-wip-us.apache.org/repos/asf/impala/blob/2ee8caeb/be/src/runtime/timestamp-value.h
----------------------------------------------------------------------
diff --git a/be/src/runtime/timestamp-value.h b/be/src/runtime/timestamp-value.h
index 2509f21..31b7000 100644
--- a/be/src/runtime/timestamp-value.h
+++ b/be/src/runtime/timestamp-value.h
@@ -95,22 +95,15 @@ class TimestampValue {
   /// Return the corresponding timestamp in the 'local_tz' time zone if
   /// FLAGS_use_local_tz_for_unix_timestamp_conversions is true. Otherwise, 
return the
   /// corresponding timestamp in UTC.
-  static TimestampValue FromUnixTime(time_t unix_time, const Timezone& 
local_tz) {
-    return TimestampValue(UnixTimeToPtime(unix_time, local_tz));
-  }
+  static TimestampValue FromUnixTime(time_t unix_time, const Timezone& 
local_tz);
 
   /// Same as FromUnixTime() above, but adds the specified number of 
nanoseconds to the
   /// resulting TimestampValue. Handles negative nanoseconds and the case where
   /// abs(nanos) >= 1e9.
   static TimestampValue FromUnixTimeNanos(time_t unix_time, int64_t nanos,
-      const Timezone& local_tz) {
-    boost::posix_time::ptime temp = UnixTimeToPtime(unix_time, local_tz);
-    temp += boost::posix_time::nanoseconds(nanos);
-    return TimestampValue(temp);
-  }
+      const Timezone& local_tz);
 
-  /// Return the corresponding timestamp in 'local_tz' time zone for the Unix 
time
-  /// specified in microseconds.
+  /// Same as FromUnixTime(), but expects the time in microseconds.
   static TimestampValue FromUnixTimeMicros(int64_t unix_time_micros,
       const Timezone& local_tz);
 
@@ -118,17 +111,15 @@ class TimestampValue {
   /// microseconds.
   static TimestampValue UtcFromUnixTimeMicros(int64_t unix_time_micros);
 
+  /// Return the corresponding timestamp in UTC for the Unix time specified in
+  /// milliseconds.
+  static TimestampValue UtcFromUnixTimeMillis(int64_t unix_time_millis);
+
   /// Returns a TimestampValue  in 'local_tz' time zone where the integer part 
of the
   /// specified 'unix_time' specifies the number of seconds (see above), and 
the
   /// fractional part is converted to nanoseconds and added to the resulting
   /// TimestampValue.
-  static TimestampValue FromSubsecondUnixTime(double unix_time,
-      const Timezone& local_tz) {
-    const time_t unix_time_whole = unix_time;
-    boost::posix_time::ptime temp = UnixTimeToPtime(unix_time_whole, local_tz);
-    temp += boost::posix_time::nanoseconds((unix_time - unix_time_whole) / 
ONE_BILLIONTH);
-    return TimestampValue(temp);
-  }
+  static TimestampValue FromSubsecondUnixTime(double unix_time, const 
Timezone& local_tz);
 
   /// Returns a TimestampValue converted from a TimestampVal. The caller must 
ensure
   /// the TimestampVal does not represent a NULL.
@@ -192,6 +183,13 @@ class TimestampValue {
         && date.day_number() <= MAX_DAY_NUMBER;
   }
 
+  /// Verifies that the time is not negative and is less then a whole day.
+  static inline bool IsValidTime(const boost::posix_time::time_duration& time) 
{
+    static const int64_t NANOS_PER_DAY = 1'000'000'000LL*60*60*24;
+    return !time.is_negative()
+        && time.total_nanoseconds() < NANOS_PER_DAY;
+  }
+
   /// Formats the timestamp using the given date/time context and places the 
result in the
   /// string buffer. The size of the buffer should be at least 
dt_ctx.fmt_out_len + 1. A
   /// string terminator will be appended to the string.
@@ -235,7 +233,7 @@ class TimestampValue {
   void LocalToUtc(const Timezone& local_tz);
 
   void set_date(const boost::gregorian::date d) { date_ = d; Validate(); }
-  void set_time(const boost::posix_time::time_duration t) { time_ = t; }
+  void set_time(const boost::posix_time::time_duration t) { time_ = t; 
Validate(); }
   const boost::gregorian::date& date() const { return date_; }
   const boost::posix_time::time_duration& time() const { return time_; }
 
@@ -276,6 +274,19 @@ class TimestampValue {
     return HashUtil::Hash(&date_, sizeof(date_), hash);
   }
 
+  /// Divides 'ticks' with 'GRANULARITY' (truncated towards negative infinity) 
and
+  /// sets 'ticks' to the remainder.
+  template <int64_t GRANULARITY>
+  inline static int64_t SplitTime(int64_t* ticks) {
+    int64_t result = *ticks / GRANULARITY;
+    *ticks %= GRANULARITY;
+    if (*ticks < 0) {
+      result--;
+      *ticks += GRANULARITY;
+    }
+    return result;
+  }
+
   static const char* LLVM_CLASS_NAME;
 
  private:
@@ -284,7 +295,6 @@ class TimestampValue {
   /// Used when converting a time with fractional seconds which are stored as 
in integer
   /// to a Unix time stored as a double.
   static const double ONE_BILLIONTH;
-
   /// Boost ptime leaves a gap in the structure, so we swap the order to make 
it
   /// 12 contiguous bytes.  We then must convert to and from the boost ptime 
data type.
   /// See IMP-87 for more information on why using ptime with the 4 byte gap is
@@ -303,24 +313,21 @@ class TimestampValue {
   }
 
   /// Sets both date and time to invalid if date is outside the valid range.
+  /// Time's validity is only checked in debug builds.
+  /// TODO: This could be also checked in release, but I am a bit afraid that 
it would
+  ///       affect performance and probably break some scenarios that are
+  ///       currently working more or less correctly.
   inline void Validate() {
     if (HasDate() && UNLIKELY(!IsValidDate(date_))) SetToInvalidDateTime();
+    else if (HasTime()) DCHECK(IsValidTime(time_));
   }
 
-  /// Return a ptime representation of the given Unix time (seconds since the 
Unix epoch).
-  /// The time zone of the resulting ptime is determined by
-  /// FLAGS_use_local_tz_for_unix_timestamp_conversions. If the flag is true, 
the value
-  /// will be in the 'local_tz' time zone. If the flag is false, the value 
will be in UTC.
-  static boost::posix_time::ptime UnixTimeToPtime(time_t unix_time,
-      const Timezone& local_tz);
-
-  /// Same as the above, but the time zone of the resulting ptime is always in 
the
-  /// 'local_tz' time zone.
-  static boost::posix_time::ptime UnixTimeToLocalPtime(time_t unix_time,
-      const Timezone& local_tz);
+  /// Converts 'unix_time' (in UTC seconds) to TimestampValue in timezone 
'local_tz'.
+  static TimestampValue UnixTimeToLocal(time_t unix_time, const Timezone& 
local_tz);
 
-  /// Same as the above, but the time zone of the resulting ptime is always in 
UTC.
-  static boost::posix_time::ptime UnixTimeToUtcPtime(time_t unix_time);
+  /// Converts 'unix_time_ticks'/TICKS_PER_SEC seconds to TimestampValue.
+  template <int32_t TICKS_PER_SEC>
+  static TimestampValue UtcFromUnixTimeTicks(int64_t unix_time_ticks);
 };
 
 /// This function must be called 'hash_value' to be picked up by boost.

http://git-wip-us.apache.org/repos/asf/impala/blob/2ee8caeb/be/src/runtime/timestamp-value.inline.h
----------------------------------------------------------------------
diff --git a/be/src/runtime/timestamp-value.inline.h 
b/be/src/runtime/timestamp-value.inline.h
index e049147..9c4b156 100644
--- a/be/src/runtime/timestamp-value.inline.h
+++ b/be/src/runtime/timestamp-value.inline.h
@@ -30,23 +30,53 @@
 
 namespace impala {
 
+template <int32_t TICKS_PER_SEC>
+inline TimestampValue TimestampValue::UtcFromUnixTimeTicks(int64_t 
unix_time_ticks) {
+  static const boost::gregorian::date EPOCH(1970,1,1);
+  int64_t days = SplitTime<(uint64_t)TICKS_PER_SEC*24*60*60>(&unix_time_ticks);
+
+  return TimestampValue(EPOCH + boost::gregorian::date_duration(days),
+      
boost::posix_time::nanoseconds(unix_time_ticks*(NANOS_PER_SEC/TICKS_PER_SEC)));
+}
+
 inline TimestampValue TimestampValue::UtcFromUnixTimeMicros(int64_t 
unix_time_micros) {
-  int64_t ts_seconds = unix_time_micros / MICROS_PER_SEC;
-  int64_t micros_part = unix_time_micros - (ts_seconds * MICROS_PER_SEC);
-  boost::posix_time::ptime temp = UnixTimeToUtcPtime(ts_seconds);
-  temp += boost::posix_time::microseconds(micros_part);
-  return TimestampValue(temp);
+  return UtcFromUnixTimeTicks<MICROS_PER_SEC>(unix_time_micros);
 }
 
 inline TimestampValue TimestampValue::FromUnixTimeMicros(int64_t 
unix_time_micros,
     const Timezone& local_tz) {
-  int64_t ts_seconds = unix_time_micros / MICROS_PER_SEC;
-  int64_t micros_part = unix_time_micros - (ts_seconds * MICROS_PER_SEC);
-  boost::posix_time::ptime temp = UnixTimeToLocalPtime(ts_seconds, local_tz);
-  temp += boost::posix_time::microseconds(micros_part);
-  return TimestampValue(temp);
+  int64_t ts_seconds = SplitTime<MICROS_PER_SEC>(&unix_time_micros);
+  TimestampValue result = FromUnixTime(ts_seconds, local_tz);
+  if (result.HasDate()) result.time_ += 
boost::posix_time::microseconds(unix_time_micros);
+  return result;
 }
 
+inline TimestampValue TimestampValue::UtcFromUnixTimeMillis(int64_t 
unix_time_millis) {
+  return UtcFromUnixTimeTicks<MILLIS_PER_SEC>(unix_time_millis);
+}
+
+inline TimestampValue TimestampValue::FromSubsecondUnixTime(
+    double unix_time, const Timezone& local_tz) {
+  int64_t unix_time_whole = unix_time;
+  int64_t nanos = (unix_time - unix_time_whole) / ONE_BILLIONTH;
+  return FromUnixTimeNanos(unix_time_whole, nanos, local_tz);
+}
+
+inline TimestampValue TimestampValue::FromUnixTimeNanos(time_t unix_time, 
int64_t nanos,
+    const Timezone& local_tz) {
+  unix_time += SplitTime<NANOS_PER_SEC>(&nanos);
+  TimestampValue result = FromUnixTime(unix_time, local_tz);
+  // 'nanos' is guaranteed to be between [0,NANOS_PER_SEC) at this point, so 
the
+  // next addition cannot change the day or step to a different timezone.
+  if (result.HasDate()) {
+    DCHECK_GE(nanos, 0);
+    DCHECK_LT(nanos, NANOS_PER_SEC);
+    result.time_ += boost::posix_time::nanoseconds(nanos);
+  }
+  return result;
+}
+
+
 /// Interpret 'this' as a timestamp in UTC and convert to unix time.
 /// Returns false if the conversion failed ('unix_time' will be undefined), 
otherwise
 /// true.

Reply via email to