Fucun Chu created IMPALA-10864: ---------------------------------- Summary: Optimize ds_hll_sketch() function Key: IMPALA-10864 URL: https://issues.apache.org/jira/browse/IMPALA-10864 Project: IMPALA Issue Type: Improvement Components: Backend Reporter: Fucun Chu
[https://lists.apache.org/thread.html/r2591247f7ca0813a0c80fb0d06b4b8fd614298ea44ea730f6ed7cfe1%40%3Cdev.impala.apache.org%3E] ??Regarding deserialization. I see in some cases that a sketch constructor is called just to replace this instance with a deserialized one. This extra construction seems unnecessary. ?? ??[https://github.com/apache/impala/blob/3d365276ea00f349df3629944b731eb4408d2c4f/be/src/exprs/aggregate-functions-ir.cc#L1819]?? ?? Looking at this DsHllMerge function ?? ??[https://github.com/apache/impala/blob/3d365276ea00f349df3629944b731eb4408d2c4f/be/src/exprs/aggregate-functions-ir.cc#L1759]?? ??it seems that the merge is done pairwise. Is it possible to arrange this process as init, multiple merges and finalize (serialize) at the end? It is quite costly to initialize a union, update it with two sketches and then call get_result(). If many such merges happen, the overhead of initializing a fresh union and finalizing it for each pair can be substantial.?? -- This message was sent by Atlassian Jira (v8.3.4#803005)