Csaba Ringhofer created IMPALA-7085:
---------------------------------------

             Summary: Consider patching Google/CCTZ for Impala's needs
                 Key: IMPALA-7085
                 URL: https://issues.apache.org/jira/browse/IMPALA-7085
             Project: IMPALA
          Issue Type: Improvement
            Reporter: Csaba Ringhofer


Google/CCTZ was chosen as Impala's new timezone database in IMPALA-3307 and 
benchmarks have shown significant improvements compared to the previous libc / 
boost solution (see  convert-timestamp-benchmark.cc  in 
https://gerrit.cloudera.org/#/c/9986/). If further optimizations are needed, it 
may become necessary to modify Google/CCTZ to serve Impala's needs with less 
overhead.

I see 3 points where avoidable overhead is added to UTC<->local conversions:

1. Timezone conversions use atomic integer hints that speed up rule lookup if 
the timestamp is close to the one in the last call ( see 
https://github.com/google/cctz/blob/a2dd3d0fbc811fe0a1d4d2dbb0341f1a3d28cb2a/src/time_zone_info.cc#L828
 ). This may be detrimental if the function is called from different threads 
with far away timestamps. As the same class that contains hints contains the 
vector of rule transitions (which can be large), creating a copy for every 
thread would be expensive.
 The solution would be to move the hints to a new class whose instances would 
not be shared between threads.

2. CCTZ handles local time in civil_seconds, which contains 
year/month/day/hour/minutes/seconds in separate fields, while Impala stores 
timestamps in "days since some epoch" (boost::gregorian::date)/"nanoseconds 
till midnight" (boost::posix_time::time_duration). A modified version of 
TimeZoneInfo::MakeTime() could skip this back and forth conversion.

3. Timezone conversions use a virtual function, while the backing class is 
always https://github.com/google/cctz/blob/master/src/time_zone_info.h.
 This could be probably avoided without modifying CCTZ by including some of its 
headers in https://github.com/google/cctz/tree/master/src





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to