Re: get wrong result for the query(both in kyin-0.71 and kylin-0.6x)

dong wang Thu, 05 Mar 2015 00:08:26 -0800

Hi hongbin,

1, with ur tips for WEB GUI, yes, currently, it can correct result as
expected.
2, for REST API,  there seems to be still some problems as explained
below(in PHP):


======================rest query=======================
$authorizationCode = base64_encode ('username:password');
$sql_str = 'select my_date from test order by my_date desc limit 10;';
        //much more than 50000 rows
$postData = array ('sql' =>$sql_str,
                   'offset' =>0,
                   'limit' =>7,
                   'acceptPartial' =>false,
                   'project' =>'default');


// Setup cURL
$url = "http://*.*.*.*/kylin/api/query";;
$ch = curl_init($url);
curl_setopt_array ($ch, array(
    CURLOPT_POST =>TRUE,
    CURLOPT_RETURNTRANSFER =>TRUE,
    CURLOPT_HTTPHEADER =>array (
        'Authorization: Basic '.$authorizationCode,
        'Content-Type: application/json;charset=utf-8'),
    CURLOPT_POSTFIELDS =>json_encode ($postData)));

// Send the request
$response = curl_exec ($ch);

=====================piece of the result for the RESTful API
query========================
  ["results"]=>
  array(2) {
    [0]=>
    array(1) {
      [0]=>
      string(10) "2015-03-02"
    }
    [1]=>
    array(1) {
      [0]=>
      string(10) "2013-10-01"
    }
  }
  ["cube"]=>
  string(12) "test"
  ["affectedRowCount"]=>
  int(0)
  ["isException"]=>
  bool(false)
  ["exceptionMessage"]=>
  NULL
  ["duration"]=>
  int(92)
  ["totalScanCount"]=>
  int(2)
  ["hitCache"]=>
  bool(false)
  ["partial"]=>
  bool(false)

note that the actual row count of the query result should be much greater
than 50000, and the date range of the cube is from 2013-10-01 to
2015-03-02, what's more, even though I set the 'acceptPartial' =>false as
above, I can still only get 2 rows as shown above(no hitting cache as
displayed) , one is the earliest, the other is the latest


2015-03-05 14:25 GMT+08:00 hongbin ma <[email protected]>:

> hi dong,
>
> we looked into it, here's the reason:
>
> By default if you're making queries on the web client, a mode called
> "AcceptPartialResults" is enabled, this is a protection mechanism that
> will only return part of the results to reduce server overhead. Honestly it
> might hurt the correctness of order by queries.
>
> If you're seeking 100% correctness, after running the query you will find a
> notification:
> *Note: Current results are partial, please click 'Show All' button to get
> all results.*
> Click the "Show All" button to disable the "AcceptPartialResults" mode, and
> you'll get a right result.
>
> Notice "AcceptPartialResults" is only enabled by default at web client,
> you'll not meet such problems if you're using JDBC, ODBC or standard REST
> API.
>
> thanks
> hongbin
>
>
> On Thu, Mar 5, 2015 at 11:33 AM, dong wang <[email protected]> wrote:
>
> > Sorry hongbin, it's my fault, for the 2 question, the row count is LESS
> > THAN 50000
> >
> > 2015-03-05 11:28 GMT+08:00 dong wang <[email protected]>:
> >
> > > HI hongbin,
> > > 1, yes, as you guessed,  the result of the above query is CORRECT if we
> > > use ASC order, thus, the issue only occurs when DESC order.
> > > 2, the total row count of the result is GREATER THAN 50000 definitely,
> > > even much more
> > >
> > > 2015-03-05 11:06 GMT+08:00 hongbin ma <[email protected]>:
> > >
> > >> for #2 to #4, will Kylin return correct answers if order by ascending
> > >> order?
> > >> And what it the total result count of #4?
> > >>
> > >> On Thu, Mar 5, 2015 at 10:56 AM, Zhou, Qianhao <[email protected]>
> > wrote:
> > >>
> > >> > #1 when add condition for date type, you should add date before the
> > time
> > >> > string
> > >> >
> > >> > the sql should be like:
> > >> > select my_date, sum(f1), sum(f2), sum(f3) from test where
> > >> > my_date >= date'2013-01-01' and my_date <= date'2015-01-31'
> > >> >
> > >> >
> > >> >
> > >> > Best Regard
> > >> > Zhou QianHao
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > On 3/5/15, 10:40 AM, "dong wang" <[email protected]> wrote:
> > >> >
> > >> > >Hi all, I have experienced the wrong result for quite a long time
> > since
> > >> > >kylin-0.6x, thus, really hope that someone can do a help.
> > >> > >
> > >> > >CUBE:
> > >> > >1, the cube is based on just one big fact table. there are 10
> > >> attributes,
> > >> > >and 3 metrics.
> > >> > >for exmaple: my_date, col2, col3, col4, col5, col6, col7, col8,
> col9,
> > >> > >col10, sum(f1), sum(f2), sum(f2)
> > >> > >
> > >> > >2, my_date, col2 is normal columns, and my_date is the partition
> > >> > >attribute,
> > >> > >which is date type in HIVE,
> > >> > >
> > >> > >3, (col3, col4), (col5, col6), (col7, col8), (col9, col10) are 4
> > >> > >hirerarchies, and col3, col5, col7, col9 are parents of its
> hirarchy
> > >> > >respectively.
> > >> > >
> > >> > >4, the aggregation groups are: my_date, col2, col3, col4, col5,
> col6,
> > >> > >col7,
> > >> > >col8, col9, col10
> > >> > >
> > >> > >5, the rowkeys are: my_date, col2, col3, col4, col5, col6, col7,
> > col8,
> > >> > >col9, col10
> > >> > >
> > >> > >
> > >> > >NOTE THAT: in the cube, the range of my_date is between 2013-10-01
> > and
> > >> > >2015-03-04
> > >> > >
> > >> > >
> > >> > >2 kinds of issue as below:
> > >> > >
> > >> > >QUERY:
> > >> > >1, select my_date, sum(f1), sum(f2), sum(f3) from test where
> > >> > >my_date>='2013-01-01' and my_date<= '2015-01-31'
> > >> > >note "my_date" is HIVE date type. and the error: Cannot apply '>='
> to
> > >> > >arguments of type '<DATE> >= <CHAR(10)>'. Supported form(s):
> > >> > >'<COMPARABLE_TYPE> >= <COMPARABLE_TYPE>' while executing SQL, how
> > >> should I
> > >> > >modify the SQL?
> > >> > >
> > >> > >2, select my_date, col3, col4, sum(f1), sum(f2), sum(f3) from test
> > >> group
> > >> > >by
> > >> > >my_date, col3, col4, order by my_date desc, col3 desc, col4 desc;
> > >> > >
> > >> > >suppose that the actual row count of the result is GREATER THAN the
> > >> > >default
> > >> > >limit 50000;
> > >> > >
> > >> > >then, the result is:
> > >> > >
> > >> > >my_date, col3, col4, sum(f1), sum(f2), sum(f3)
> > >> > >2013-10-23
> > >> > >2013-10-23
> > >> > >...
> > >> > >2013-10-22
> > >> > >2013-10-22
> > >> > >...
> > >> > >2013-10-21
> > >> > >2013-10-21
> > >> > >...
> > >> > >...
> > >> > >...
> > >> > >2013-10-01
> > >> > >2013-10-01(note 2013-10-01 is the earliest day of the cube)
> > >> > >
> > >> > >totally, there are 50000 rows for the result,  but it is obvious
> that
> > >> the
> > >> > >order by and limit doesn't work correctly.
> > >> > >
> > >> > >3, select my_date, col3, col4, sum(f1), sum(f2), sum(f3) from test
> > >> group
> > >> > >by
> > >> > >my_date, col3, col4, order by my_date desc, col3 desc, col4 desc
> > LIMIT
> > >> 100
> > >> > >
> > >> > >just adding "LIMIT 100" to the tail of the 2nd query above, then,
> the
> > >> > >result is:
> > >> > >
> > >> > >my_date, col3, col4, sum(f1), sum(f2), sum(f3)
> > >> > >2013-10-05
> > >> > >2013-10-05
> > >> > >....
> > >> > >2013-10-05
> > >> > >
> > >> > >totally, there are 100 rows. all of which are "2013-10-05",
> > apparently,
> > >> > >the
> > >> > >result is not correct as well.
> > >> > >
> > >> > >4, select my_date, sum(f1), sum(f2), sum(f3) from test group by
> > my_date
> > >> > >order by my_date desc LIMIT 100
> > >> > >if I just add "my_date" as "aggreation level" as above, the result
> of
> > >> the
> > >> > >4th query is correct.
> > >> >
> > >> >
> > >>
> > >
> > >
> >
>

Re: get wrong result for the query(both in kyin-0.71 and kylin-0.6x)

Reply via email to