As Tutorial - Quick Start with Sample Cube suggest to verify the query
results, I took one step further and verified my own query in Hive and
Kylin. See below.
Test the following query in Hive and Kyelin:
In my HDP 2.4 VM, it took 30 and 1 seconds for Hive and Kylin respectively.
Wow, I did noticed the fast response in Kylin; however, the distinct
seller_id are different, i.e.: 68 in Hive and 65 in Kylin for week_being_dt
2012-01-01.
Query:
SELECT
c.week_beg_dt, sum(price) as total_selled, count(distinct seller_id) as
sellers
FROM
kylin_sales s
JOIN kylin_cal_dt c ON s.part_dt = c.cal_dt
GROUP BY
c.week_beg_dt
ORDER BY
c.week_beg_dt;
Picture from Hive:
<http://apache-kylin.74782.x6.nabble.com/file/n6808/Hive_Result.png>
Picture from Kylin:
<http://apache-kylin.74782.x6.nabble.com/file/n6808/Kylin_Result.png>
I check the detailed rows in Kylin and got 68, not 65. Why are they
different numbers in count(distinct seller_id)?
Kylin details
select seller_id, sum(price) as total_selled from kylin_sales WHERE part_dt
>= '2012-01-01' AND part_dt <= '2012-01-05' group by seller_id order by
seller_id;
SELLER_ID TOTAL_SELLED
10000015 76.301
10000017 2.922
10000028 91.734
10000033 87.773
10000054 22.894
10000075 3.045
10000076 13.773
10000090 12.371
10000095 84.876
10000119 21.522
10000141 32.2
10000148 96.598
10000151 21.334
10000165 55.196
10000167 34.805
10000173 65.986
10000186 71.938
10000206 77.638
10000217 30.971
10000234 88.069
10000254 4.216
10000263 16.727
10000266 37.082
10000267 89.793
10000288 96.843
10000306 37.318
10000326 34.194
10000335 65.552
10000360 72.086
10000365 40.785
10000375 72.372
10000403 71.565
10000439 44.749
10000443 4.482
10000467 84.5
10000482 45.369
10000509 94.175
10000514 82.234
10000537 40.129
10000545 23.949
10000553 87.856
10000555 99.394
10000565 12.702
10000566 88.427
10000567 83.306
10000699 63.295
10000705 69.248
10000728 32.516
10000730 95.506
10000731 48.243
10000739 90.372
10000743 157.589
10000755 28.416
10000774 0.723
10000785 45.816
10000794 54.281
10000821 92.507
10000824 86.434
10000828 13.281
10000830 80.148
10000831 16.978
10000859 93.922
10000902 19.543
10000905 34.131
10000933 47.99
10000951 23.107
10000991 4.212
10000993 20.504
--
View this message in context:
http://apache-kylin.74782.x6.nabble.com/Anomaly-in-Aggregation-tp6808.html
Sent from the Apache Kylin mailing list archive at Nabble.com.