As Tutorial - Quick Start with Sample Cube suggest to verify the query
results, I took one step further and verified my own query in Hive and
Kylin. See below.


Test the following query in Hive and Kyelin:

In my HDP 2.4 VM, it took 30 and 1 seconds for Hive and Kylin respectively.
Wow, I did noticed the fast response in Kylin; however, the distinct
seller_id are different, i.e.: 68 in Hive and 65 in Kylin for week_being_dt
2012-01-01. 


Query:

SELECT

        c.week_beg_dt, sum(price) as total_selled, count(distinct seller_id) as
sellers

FROM

        kylin_sales s

        JOIN kylin_cal_dt c ON s.part_dt = c.cal_dt

GROUP BY

        c.week_beg_dt

ORDER BY

        c.week_beg_dt;


Picture from Hive:

<http://apache-kylin.74782.x6.nabble.com/file/n6808/Hive_Result.png> 
Picture from Kylin: 

<http://apache-kylin.74782.x6.nabble.com/file/n6808/Kylin_Result.png> 

I check the detailed rows in Kylin and got 68, not 65. Why are they
different numbers in count(distinct seller_id)? 


Kylin details 

select seller_id, sum(price) as total_selled from kylin_sales WHERE part_dt
>= '2012-01-01' AND part_dt <= '2012-01-05' group by seller_id order by
seller_id;


SELLER_ID       TOTAL_SELLED

10000015        76.301

10000017        2.922

10000028        91.734

10000033        87.773

10000054        22.894

10000075        3.045

10000076        13.773

10000090        12.371

10000095        84.876

10000119        21.522

10000141        32.2

10000148        96.598

10000151        21.334

10000165        55.196

10000167        34.805

10000173        65.986

10000186        71.938

10000206        77.638

10000217        30.971

10000234        88.069

10000254        4.216

10000263        16.727

10000266        37.082

10000267        89.793

10000288        96.843

10000306        37.318

10000326        34.194

10000335        65.552

10000360        72.086

10000365        40.785

10000375        72.372

10000403        71.565

10000439        44.749

10000443        4.482

10000467        84.5

10000482        45.369

10000509        94.175

10000514        82.234

10000537        40.129

10000545        23.949

10000553        87.856

10000555        99.394

10000565        12.702

10000566        88.427

10000567        83.306

10000699        63.295

10000705        69.248

10000728        32.516

10000730        95.506

10000731        48.243

10000739        90.372

10000743        157.589

10000755        28.416

10000774        0.723

10000785        45.816

10000794        54.281

10000821        92.507

10000824        86.434

10000828        13.281

10000830        80.148

10000831        16.978

10000859        93.922

10000902        19.543

10000905        34.131

10000933        47.99

10000951        23.107

10000991        4.212

10000993        20.504



--
View this message in context: 
http://apache-kylin.74782.x6.nabble.com/Anomaly-in-Aggregation-tp6808.html
Sent from the Apache Kylin mailing list archive at Nabble.com.

Reply via email to