imay commented on a change in pull request #832: Support hll_raw_agg in 
Aggregate Function
URL: https://github.com/apache/incubator-doris/pull/832#discussion_r270696881
 
 

 ##########
 File path: docs/help/Contents/Data Definition/ddl_stmt.md
 ##########
 @@ -952,38 +955,40 @@
 
       b. 使用数据中的某一列生成hll列
         curl --location-trusted -uname:password -T data 
http://host/api/test_db/test/_load?label=load_1\&hll=set1,cuid:set2,os
-            \&columns=time,id,name,province,sex,cuid,os
+            \&columns=dt,id,name,province,sex,cuid,os
 
     3. 聚合数据,常用方式3种:(如果不聚合直接对base表查询,速度可能跟直接使用ndv速度差不多)
 
       a. 创建一个rollup,让hll列产生聚合,
-        alter table test add rollup test_rollup(date, set1);
+        alter table test add rollup test_rollup(dt, set1);
         
       b. 创建另外一张专门计算uv的表,然后insert数据)
     
         create table test_uv(
-        time date,
+        dt date,
         uv_set hll hll_union)
         distributed by hash(id) buckets 32;
 
-        insert into test_uv select date, set1 from test;
+        insert into test_uv select dt, set1 from test;
         
       c. 创建另外一张专门计算uv的表,然后insert并通过hll_hash根据test其它非hll列生成hll列
       
         create table test_uv(
-        time date,
+        dt date,
         id_set hll hll_union)
         distributed by hash(id) buckets 32;
         
-        insert into test_uv select date, hll_hash(id) from test;
+        insert into test_uv select dt, hll_hash(id) from test;
             
     4. 查询,hll列不允许直接查询它的原始值,可以通过配套的函数进行查询
     
       a. 求总uv
         select HLL_UNION_AGG(uv_set) from test_uv;
             
       b. 求每一天的uv
-        select HLL_CARDINALITY(uv_set) from test_uv;
+        select dt, HLL_CARDINALITY(uv_set) from test_uv;
+        select dt, HLL_CARDINALITY(uv) from (select dt, HLL_RAW_AGG(set1) as 
uv from test group by dt) tmp;
+        select dt, HLL_UNION_AGG(set1) as uv from test group by dt;
 
 Review comment:
   I think that these three queries don't lead to same result. 
   
   1 would return many rows for one `dt` value
   2 and 3 queries would group by `dt` column and return one row for each value
   
   So, you should split them into different examples

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to