Sorry hongbin, it's my fault, for the 2 question, the row count is LESS THAN 50000
2015-03-05 11:28 GMT+08:00 dong wang <[email protected]>: > HI hongbin, > 1, yes, as you guessed, the result of the above query is CORRECT if we > use ASC order, thus, the issue only occurs when DESC order. > 2, the total row count of the result is GREATER THAN 50000 definitely, > even much more > > 2015-03-05 11:06 GMT+08:00 hongbin ma <[email protected]>: > >> for #2 to #4, will Kylin return correct answers if order by ascending >> order? >> And what it the total result count of #4? >> >> On Thu, Mar 5, 2015 at 10:56 AM, Zhou, Qianhao <[email protected]> wrote: >> >> > #1 when add condition for date type, you should add date before the time >> > string >> > >> > the sql should be like: >> > select my_date, sum(f1), sum(f2), sum(f3) from test where >> > my_date >= date'2013-01-01' and my_date <= date'2015-01-31' >> > >> > >> > >> > Best Regard >> > Zhou QianHao >> > >> > >> > >> > >> > >> > On 3/5/15, 10:40 AM, "dong wang" <[email protected]> wrote: >> > >> > >Hi all, I have experienced the wrong result for quite a long time since >> > >kylin-0.6x, thus, really hope that someone can do a help. >> > > >> > >CUBE: >> > >1, the cube is based on just one big fact table. there are 10 >> attributes, >> > >and 3 metrics. >> > >for exmaple: my_date, col2, col3, col4, col5, col6, col7, col8, col9, >> > >col10, sum(f1), sum(f2), sum(f2) >> > > >> > >2, my_date, col2 is normal columns, and my_date is the partition >> > >attribute, >> > >which is date type in HIVE, >> > > >> > >3, (col3, col4), (col5, col6), (col7, col8), (col9, col10) are 4 >> > >hirerarchies, and col3, col5, col7, col9 are parents of its hirarchy >> > >respectively. >> > > >> > >4, the aggregation groups are: my_date, col2, col3, col4, col5, col6, >> > >col7, >> > >col8, col9, col10 >> > > >> > >5, the rowkeys are: my_date, col2, col3, col4, col5, col6, col7, col8, >> > >col9, col10 >> > > >> > > >> > >NOTE THAT: in the cube, the range of my_date is between 2013-10-01 and >> > >2015-03-04 >> > > >> > > >> > >2 kinds of issue as below: >> > > >> > >QUERY: >> > >1, select my_date, sum(f1), sum(f2), sum(f3) from test where >> > >my_date>='2013-01-01' and my_date<= '2015-01-31' >> > >note "my_date" is HIVE date type. and the error: Cannot apply '>=' to >> > >arguments of type '<DATE> >= <CHAR(10)>'. Supported form(s): >> > >'<COMPARABLE_TYPE> >= <COMPARABLE_TYPE>' while executing SQL, how >> should I >> > >modify the SQL? >> > > >> > >2, select my_date, col3, col4, sum(f1), sum(f2), sum(f3) from test >> group >> > >by >> > >my_date, col3, col4, order by my_date desc, col3 desc, col4 desc; >> > > >> > >suppose that the actual row count of the result is GREATER THAN the >> > >default >> > >limit 50000; >> > > >> > >then, the result is: >> > > >> > >my_date, col3, col4, sum(f1), sum(f2), sum(f3) >> > >2013-10-23 >> > >2013-10-23 >> > >... >> > >2013-10-22 >> > >2013-10-22 >> > >... >> > >2013-10-21 >> > >2013-10-21 >> > >... >> > >... >> > >... >> > >2013-10-01 >> > >2013-10-01(note 2013-10-01 is the earliest day of the cube) >> > > >> > >totally, there are 50000 rows for the result, but it is obvious that >> the >> > >order by and limit doesn't work correctly. >> > > >> > >3, select my_date, col3, col4, sum(f1), sum(f2), sum(f3) from test >> group >> > >by >> > >my_date, col3, col4, order by my_date desc, col3 desc, col4 desc LIMIT >> 100 >> > > >> > >just adding "LIMIT 100" to the tail of the 2nd query above, then, the >> > >result is: >> > > >> > >my_date, col3, col4, sum(f1), sum(f2), sum(f3) >> > >2013-10-05 >> > >2013-10-05 >> > >.... >> > >2013-10-05 >> > > >> > >totally, there are 100 rows. all of which are "2013-10-05", apparently, >> > >the >> > >result is not correct as well. >> > > >> > >4, select my_date, sum(f1), sum(f2), sum(f3) from test group by my_date >> > >order by my_date desc LIMIT 100 >> > >if I just add "my_date" as "aggreation level" as above, the result of >> the >> > >4th query is correct. >> > >> > >> > >
