[ 
https://issues.apache.org/jira/browse/IMPALA-9633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Kaszab updated IMPALA-9633:
---------------------------------
    Description: 
ds_hll_union() is an aggregating function that accepts sketches and produces a 
single scratch that is the combination of the received scratches.

Example from Hive:
{code:java}
create temporary table sketch_intermediate (category char(1), sketch binary);
insert into sketch_intermediate select category, ds_hll_sketch(id) from 
sketch_input group by category;
select ds_hll_estimate(ds_hll_union(sketch)) from sketch_intermediate;
{code}
Some test data for the example:
{code:java}
create temporary table sketch_input (id int, category char(1));
insert into table sketch_input values
  (1, 'a'), (2, 'a'), (3, 'a'), (4, 'a'), (5, 'a'), (6, 'a'), (7, 'a'), (8, 
'a'), (9, 'a'), (10, 'a'),
  (6, 'b'), (7, 'b'), (8, 'b'), (9, 'b'), (10, 'b'), (11, 'b'), (12, 'b'), (13, 
'b'), (14, 'b'), (15, 'b');
{code}
Approximate result:
{code:java}
15.000000521540663
{code}

Hive change that introduced the same: 
https://issues.apache.org/jira/browse/HIVE-22940

  was:
ds_hll_union() is an aggregating function that accepts sketches and produces a 
single scratch that is the combination of the received scratches.

Example from Hive:
{code:java}
create temporary table sketch_intermediate (category char(1), sketch binary);
insert into sketch_intermediate select category, ds_hll_sketch(id) from 
sketch_input group by category;
select ds_hll_estimate(ds_hll_union(sketch)) from sketch_intermediate;
{code}

Some test data for the example:
{code:java}
create temporary table sketch_input (id int, category char(1));
insert into table sketch_input values
  (1, 'a'), (2, 'a'), (3, 'a'), (4, 'a'), (5, 'a'), (6, 'a'), (7, 'a'), (8, 
'a'), (9, 'a'), (10, 'a'),
  (6, 'b'), (7, 'b'), (8, 'b'), (9, 'b'), (10, 'b'), (11, 'b'), (12, 'b'), (13, 
'b'), (14, 'b'), (15, 'b');
{code}

Approximate result:
{code:java}
15.000000521540663
{code}



> Implement ds_hll_union() builtin function
> -----------------------------------------
>
>                 Key: IMPALA-9633
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9633
>             Project: IMPALA
>          Issue Type: New Feature
>          Components: Backend, Frontend
>            Reporter: Gabor Kaszab
>            Priority: Major
>
> ds_hll_union() is an aggregating function that accepts sketches and produces 
> a single scratch that is the combination of the received scratches.
> Example from Hive:
> {code:java}
> create temporary table sketch_intermediate (category char(1), sketch binary);
> insert into sketch_intermediate select category, ds_hll_sketch(id) from 
> sketch_input group by category;
> select ds_hll_estimate(ds_hll_union(sketch)) from sketch_intermediate;
> {code}
> Some test data for the example:
> {code:java}
> create temporary table sketch_input (id int, category char(1));
> insert into table sketch_input values
>   (1, 'a'), (2, 'a'), (3, 'a'), (4, 'a'), (5, 'a'), (6, 'a'), (7, 'a'), (8, 
> 'a'), (9, 'a'), (10, 'a'),
>   (6, 'b'), (7, 'b'), (8, 'b'), (9, 'b'), (10, 'b'), (11, 'b'), (12, 'b'), 
> (13, 'b'), (14, 'b'), (15, 'b');
> {code}
> Approximate result:
> {code:java}
> 15.000000521540663
> {code}
> Hive change that introduced the same: 
> https://issues.apache.org/jira/browse/HIVE-22940



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to