Fucun Chu created IMPALA-10864:
----------------------------------

             Summary: Optimize ds_hll_sketch() function
                 Key: IMPALA-10864
                 URL: https://issues.apache.org/jira/browse/IMPALA-10864
             Project: IMPALA
          Issue Type: Improvement
          Components: Backend
            Reporter: Fucun Chu


[https://lists.apache.org/thread.html/r2591247f7ca0813a0c80fb0d06b4b8fd614298ea44ea730f6ed7cfe1%40%3Cdev.impala.apache.org%3E]

 

??Regarding deserialization. I see in some cases that a sketch constructor is 
called just to replace this instance with a deserialized one. This extra 
construction seems unnecessary. ??

??[https://github.com/apache/impala/blob/3d365276ea00f349df3629944b731eb4408d2c4f/be/src/exprs/aggregate-functions-ir.cc#L1819]??

?? Looking at this DsHllMerge function ??

??[https://github.com/apache/impala/blob/3d365276ea00f349df3629944b731eb4408d2c4f/be/src/exprs/aggregate-functions-ir.cc#L1759]??

??it seems that the merge is done pairwise. Is it possible to arrange this 
process as init, multiple merges and finalize (serialize) at the end? It is 
quite costly to initialize a union, update it with two sketches and then call 
get_result(). If many such merges happen, the overhead of initializing a fresh 
union and finalizing it for each pair can be substantial.??



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to