Hi hongbin,
1, with ur tips for WEB GUI, yes, currently, it can correct result as
expected.
2, for REST API, there seems to be still some problems as explained
below(in PHP):
======================rest query=======================
$authorizationCode = base64_encode ('username:password');
$sql_str = 'select my_date from test order by my_date desc limit 10;';
//much more than 50000 rows
$postData = array ('sql' =>$sql_str,
'offset' =>0,
'limit' =>7,
'acceptPartial' =>false,
'project' =>'default');
// Setup cURL
$url = "http://*.*.*.*/kylin/api/query";
$ch = curl_init($url);
curl_setopt_array ($ch, array(
CURLOPT_POST =>TRUE,
CURLOPT_RETURNTRANSFER =>TRUE,
CURLOPT_HTTPHEADER =>array (
'Authorization: Basic '.$authorizationCode,
'Content-Type: application/json;charset=utf-8'),
CURLOPT_POSTFIELDS =>json_encode ($postData)));
// Send the request
$response = curl_exec ($ch);
=====================piece of the result for the RESTful API
query========================
["results"]=>
array(2) {
[0]=>
array(1) {
[0]=>
string(10) "2015-03-02"
}
[1]=>
array(1) {
[0]=>
string(10) "2013-10-01"
}
}
["cube"]=>
string(12) "test"
["affectedRowCount"]=>
int(0)
["isException"]=>
bool(false)
["exceptionMessage"]=>
NULL
["duration"]=>
int(92)
["totalScanCount"]=>
int(2)
["hitCache"]=>
bool(false)
["partial"]=>
bool(false)
note that the actual row count of the query result should be much greater
than 50000, and the date range of the cube is from 2013-10-01 to
2015-03-02, what's more, even though I set the 'acceptPartial' =>false as
above, I can still only get 2 rows as shown above(no hitting cache as
displayed) , one is the earliest, the other is the latest
2015-03-05 14:25 GMT+08:00 hongbin ma <[email protected]>:
> hi dong,
>
> we looked into it, here's the reason:
>
> By default if you're making queries on the web client, a mode called
> "AcceptPartialResults" is enabled​, this is a protection mechanism that
> will only return part of the results to reduce server overhead. Honestly it
> might hurt the correctness of order by queries.
>
> If you're seeking 100% correctness, after running the query you will find a
> notification:
> *Note: Current results are partial, please click 'Show All' button to get
> all results.*
> Click the "Show All" button to disable the "AcceptPartialResults" mode, and
> you'll get a right result.
>
> Notice "AcceptPartialResults" is only enabled by default at web client,
> you'll not meet such problems if you're using JDBC, ODBC or standard REST
> API.
>
> thanks
> hongbin
>
>
> On Thu, Mar 5, 2015 at 11:33 AM, dong wang <[email protected]> wrote:
>
> > Sorry hongbin, it's my fault, for the 2 question, the row count is LESS
> > THAN 50000
> >
> > 2015-03-05 11:28 GMT+08:00 dong wang <[email protected]>:
> >
> > > HI hongbin,
> > > 1, yes, as you guessed, the result of the above query is CORRECT if we
> > > use ASC order, thus, the issue only occurs when DESC order.
> > > 2, the total row count of the result is GREATER THAN 50000 definitely,
> > > even much more
> > >
> > > 2015-03-05 11:06 GMT+08:00 hongbin ma <[email protected]>:
> > >
> > >> for #2 to #4, will Kylin return correct answers if order by ascending
> > >> order?
> > >> And what it the total result count of #4?
> > >>
> > >> On Thu, Mar 5, 2015 at 10:56 AM, Zhou, Qianhao <[email protected]>
> > wrote:
> > >>
> > >> > #1 when add condition for date type, you should add date before the
> > time
> > >> > string
> > >> >
> > >> > the sql should be like:
> > >> > select my_date, sum(f1), sum(f2), sum(f3) from test where
> > >> > my_date >= date'2013-01-01' and my_date <= date'2015-01-31'
> > >> >
> > >> >
> > >> >
> > >> > Best Regard
> > >> > Zhou QianHao
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > On 3/5/15, 10:40 AM, "dong wang" <[email protected]> wrote:
> > >> >
> > >> > >Hi all, I have experienced the wrong result for quite a long time
> > since
> > >> > >kylin-0.6x, thus, really hope that someone can do a help.
> > >> > >
> > >> > >CUBE:
> > >> > >1, the cube is based on just one big fact table. there are 10
> > >> attributes,
> > >> > >and 3 metrics.
> > >> > >for exmaple: my_date, col2, col3, col4, col5, col6, col7, col8,
> col9,
> > >> > >col10, sum(f1), sum(f2), sum(f2)
> > >> > >
> > >> > >2, my_date, col2 is normal columns, and my_date is the partition
> > >> > >attribute,
> > >> > >which is date type in HIVE,
> > >> > >
> > >> > >3, (col3, col4), (col5, col6), (col7, col8), (col9, col10) are 4
> > >> > >hirerarchies, and col3, col5, col7, col9 are parents of its
> hirarchy
> > >> > >respectively.
> > >> > >
> > >> > >4, the aggregation groups are: my_date, col2, col3, col4, col5,
> col6,
> > >> > >col7,
> > >> > >col8, col9, col10
> > >> > >
> > >> > >5, the rowkeys are: my_date, col2, col3, col4, col5, col6, col7,
> > col8,
> > >> > >col9, col10
> > >> > >
> > >> > >
> > >> > >NOTE THAT: in the cube, the range of my_date is between 2013-10-01
> > and
> > >> > >2015-03-04
> > >> > >
> > >> > >
> > >> > >2 kinds of issue as below:
> > >> > >
> > >> > >QUERY:
> > >> > >1, select my_date, sum(f1), sum(f2), sum(f3) from test where
> > >> > >my_date>='2013-01-01' and my_date<= '2015-01-31'
> > >> > >note "my_date" is HIVE date type. and the error: Cannot apply '>='
> to
> > >> > >arguments of type '<DATE> >= <CHAR(10)>'. Supported form(s):
> > >> > >'<COMPARABLE_TYPE> >= <COMPARABLE_TYPE>' while executing SQL, how
> > >> should I
> > >> > >modify the SQL?
> > >> > >
> > >> > >2, select my_date, col3, col4, sum(f1), sum(f2), sum(f3) from test
> > >> group
> > >> > >by
> > >> > >my_date, col3, col4, order by my_date desc, col3 desc, col4 desc;
> > >> > >
> > >> > >suppose that the actual row count of the result is GREATER THAN the
> > >> > >default
> > >> > >limit 50000;
> > >> > >
> > >> > >then, the result is:
> > >> > >
> > >> > >my_date, col3, col4, sum(f1), sum(f2), sum(f3)
> > >> > >2013-10-23
> > >> > >2013-10-23
> > >> > >...
> > >> > >2013-10-22
> > >> > >2013-10-22
> > >> > >...
> > >> > >2013-10-21
> > >> > >2013-10-21
> > >> > >...
> > >> > >...
> > >> > >...
> > >> > >2013-10-01
> > >> > >2013-10-01(note 2013-10-01 is the earliest day of the cube)
> > >> > >
> > >> > >totally, there are 50000 rows for the result, but it is obvious
> that
> > >> the
> > >> > >order by and limit doesn't work correctly.
> > >> > >
> > >> > >3, select my_date, col3, col4, sum(f1), sum(f2), sum(f3) from test
> > >> group
> > >> > >by
> > >> > >my_date, col3, col4, order by my_date desc, col3 desc, col4 desc
> > LIMIT
> > >> 100
> > >> > >
> > >> > >just adding "LIMIT 100" to the tail of the 2nd query above, then,
> the
> > >> > >result is:
> > >> > >
> > >> > >my_date, col3, col4, sum(f1), sum(f2), sum(f3)
> > >> > >2013-10-05
> > >> > >2013-10-05
> > >> > >....
> > >> > >2013-10-05
> > >> > >
> > >> > >totally, there are 100 rows. all of which are "2013-10-05",
> > apparently,
> > >> > >the
> > >> > >result is not correct as well.
> > >> > >
> > >> > >4, select my_date, sum(f1), sum(f2), sum(f3) from test group by
> > my_date
> > >> > >order by my_date desc LIMIT 100
> > >> > >if I just add "my_date" as "aggreation level" as above, the result
> of
> > >> the
> > >> > >4th query is correct.
> > >> >
> > >> >
> > >>
> > >
> > >
> >
>