[ 
https://issues.apache.org/jira/browse/IMPALA-10017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175210#comment-17175210
 ] 

ASF subversion and git services commented on IMPALA-10017:
----------------------------------------------------------

Commit f95f7940e4a290d75ee85fd78e85bc26795f0f9f in impala's branch 
refs/heads/master from Gabor Kaszab
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=f95f794 ]

IMPALA-10017: Implement ds_kll_union() function

This function receives a set of serialized Apache DataSketches KLL
sketches produced by ds_kll_sketch() and merges them into a single
sketch.

An example usage is to create a sketch for each partition of a table,
write these sketches to a separate table and based on which partition
the user is interested of the relevant sketches can be union-ed
together to get an estimate. E.g.:
  SELECT
      ds_kll_quantile(ds_kll_union(sketch_col), 0.5)
  FROM sketch_tbl
  WHERE partition_col=1 OR partition_col=5;

Testing:
  - Apart from the automated tests I added to this patch I also
    tested ds_kll_union() on a bigger dataset to check that
    serialization, deserialization and merging steps work well. I
    took TPCH25.linelitem, created a number of sketches with grouping
    by l_shipdate and called ds_kll_union() on those sketches.

Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf
Reviewed-on: http://gerrit.cloudera.org:8080/16267
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Implement ds_kll_union() function
> ---------------------------------
>
>                 Key: IMPALA-10017
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10017
>             Project: IMPALA
>          Issue Type: New Feature
>          Components: Backend, Frontend
>            Reporter: Gabor Kaszab
>            Assignee: Gabor Kaszab
>            Priority: Major
>             Fix For: Impala 4.0
>
>
> Similarly to https://issues.apache.org/jira/browse/IMPALA-9633 DataSketches 
> KLL also has union functionality.
> ds_kll_union() expects a dataset of STRINGs (e.g. column of a table) that 
> represents serialized KLL sketches and returns a single STRING that is the 
> received sketches unioned and serialized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to