[ 
https://issues.apache.org/jira/browse/IMPALA-7085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer updated IMPALA-7085:
------------------------------------
    Labels: Timestamp performance  (was: Timestamp perfomance)

> Consider patching Google/CCTZ for Impala's needs
> ------------------------------------------------
>
>                 Key: IMPALA-7085
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7085
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>            Reporter: Csaba Ringhofer
>            Priority: Major
>              Labels: Timestamp, performance
>
> Google/CCTZ was chosen as Impala's new timezone database in IMPALA-3307 and 
> benchmarks have shown significant improvements compared to the previous libc 
> / boost solution (see  convert-timestamp-benchmark.cc  in 
> https://gerrit.cloudera.org/#/c/9986/). If further optimizations are needed, 
> it may become necessary to modify Google/CCTZ to serve Impala's needs with 
> less overhead.
> I see 3 points where avoidable overhead is added to UTC<->local conversions:
> 1. Timezone conversions use atomic integer hints that speed up rule lookup if 
> the timestamp is close to the one in the last call ( see 
> https://github.com/google/cctz/blob/a2dd3d0fbc811fe0a1d4d2dbb0341f1a3d28cb2a/src/time_zone_info.cc#L828
>  ). This may be detrimental if the function is called from different threads 
> with far away timestamps. As the same class that contains hints contains the 
> vector of rule transitions (which can be large), creating a copy for every 
> thread would be expensive.
>  The solution would be to move the hints to a new class whose instances would 
> not be shared between threads.
> 2. CCTZ handles local time in civil_seconds, which contains 
> year/month/day/hour/minutes/seconds in separate fields, while Impala stores 
> timestamps in "days since some epoch" (boost::gregorian::date)/"nanoseconds 
> till midnight" (boost::posix_time::time_duration). A modified version of 
> TimeZoneInfo::MakeTime() could skip this back and forth conversion.
> 3. Timezone conversions use a virtual function, while the backing class is 
> always https://github.com/google/cctz/blob/master/src/time_zone_info.h.
>  This could be probably avoided without modifying CCTZ by including some of 
> its headers in https://github.com/google/cctz/tree/master/src



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to