Re: get wrong result for the query(both in kyin-0.71 and kylin-0.6x)

Santosh Akhilesh Thu, 05 Mar 2015 18:38:26 -0800

Hi Hongbin,
Ok, I will add group by and check. But can you please also check why my dim
table user is not being recognised in query. And also I can't query if I
add the column value which is in fact and I have added a measure over it.
I had sent my cube request Json.
Regards
Santosh
On Fri, 6 Mar 2015 at 7:50 am, hongbin ma <[email protected]> wrote:


> hi santoshakhilesh,
>
> In 0.6 versions, Kylin is only meant for aggregated queries, not detailed
> level queries. Try modifying your query by adding some groupbys.
>
> I believe "select * from factable" will work for your case.
> The reason why you can support "select * from facttable" is that when Kylin
> detects such query pattern, it will secretly add all the dimension columns
> and metric columns on fact table to the query and the query will be treated
> like:
>
> select dim1,dim2,dim3.... dimN, metric1,metric2...metricN from facttable
> groupby dim1, dim2, dim3,,,dimN
>
> So the result of "select * from factable" is aggregated if two rows in fact
> table share the same dimension values.
>
> In 0.7 version which we introduced a new module called "inverted index",
> with data organized in "inverted index" Kylin can better answer detailed
> level queries (By MPP instead of cubing). However this is still under
> development and not ready for community use.
>
> thanks
> hongbin
>
>
> On Thu, Mar 5, 2015 at 11:34 PM, Santoshakhilesh <
> [email protected]> wrote:
>
> > Hi All ,
> >
> > I am facing couple of issue regarding my queries in 0.6
> >
> > 1. Query result in web ui is not correct , following return only one
> > record , if I run count(*) on same join
> >     result is 1000 which is correct
> > SELECT
> > AREA.NAME
> > ,AREA.COUNTRY
> > ,AREA.PROVINCE
> > ,AREA.CITY
> > ,RESOURCEFACT.RESOURCEID
> > ,RESOURCEFACT.INDICATORID
> > ,RESOURCEFACT.COLLECTIONTIME
> > ,RESOURCEFACT.PERIOD
> > ,RESOURCEFACT.AREAID
> > ,RESOURCEFACT.CUSTOMERID
> > FROM RESOURCEFACT
> > LEFT JOIN AREA
> > ON RESOURCEFACT.AREAID = AREA.AREAID
> > LEFT JOIN KPI
> > ON RESOURCEFACT.INDICATORID = KPI.INDICATORID
> >
> > 2. The sample query on cube wizard is following; I have another table
> User
> > when I run this query after cube build Kylin throws exception that it
> cant
> > find user table. even select count(*) on User table fails. But the table
> is
> > there in hive. RESOURCEFACT.VALUE this column cant be recognized by sql
> > query on web ui.
> >
> > SELECT
> > AREA.NAME
> > ,AREA.COUNTRY
> > ,AREA.PROVINCE
> > ,AREA.CITY
> > ,RESOURCEFACT.RESOURCEID
> > ,RESOURCEFACT.INDICATORID
> > ,RESOURCEFACT.COLLECTIONTIME
> > ,RESOURCEFACT.PERIOD
> > ,RESOURCEFACT.VALUE
> > ,RESOURCEFACT.AREAID
> > ,RESOURCEFACT.CUSTOMERID
> > FROM RESOURCEFACT
> > LEFT JOIN AREA
> > ON RESOURCEFACT.AREAID = AREA.AREAID
> > LEFT JOIN USER
> > ON RESOURCEFACT.CUSTOMERID = USER.USERID
> > LEFT JOIN KPI
> > ON RESOURCEFACT.INDICATORID = KPI.INDICATORID
> >
> > MY Cube Json is as below.
> > {
> >   "uuid": "95204f18-24f7-43e5-bead-368900f74f9c",
> >   "name": "secondcube",
> >   "description": "",
> >   "dimensions": [
> >     {
> >       "id": 1,
> >       "name": "RESOURCEGROUP",
> >       "join": {
> >         "type": "left",
> >         "primary_key": [
> >           "AREAID"
> >         ],
> >         "foreign_key": [
> >           "AREAID"
> >         ]
> >       },
> >       "hierarchy": [
> >         {
> >           "level": "1",
> >           "column": "NAME"
> >         },
> >         {
> >           "level": "2",
> >           "column": "COUNTRY"
> >         },
> >         {
> >           "level": "3",
> >           "column": "PROVINCE"
> >         },
> >         {
> >           "level": "4",
> >           "column": "CITY"
> >         }
> >       ],
> >       "table": "AREA",
> >       "column": null,
> >       "datatype": null,
> >       "derived": null
> >     },
> >     {
> >       "id": 2,
> >       "name": "RESOURCEFACT.RESOURCEID",
> >       "join": null,
> >       "hierarchy": null,
> >       "table": "RESOURCEFACT",
> >       "column": "RESOURCEID",
> >       "datatype": null,
> >       "derived": null
> >     },
> >     {
> >       "id": 3,
> >       "name": "RESOURCEFACT.INDICATORID",
> >       "join": null,
> >       "hierarchy": null,
> >       "table": "RESOURCEFACT",
> >       "column": "INDICATORID",
> >       "datatype": null,
> >       "derived": null
> >     },
> >     {
> >       "id": 4,
> >       "name": "RESOURCEFACT.COLLECTIONTIME",
> >       "join": null,
> >       "hierarchy": null,
> >       "table": "RESOURCEFACT",
> >       "column": "COLLECTIONTIME",
> >       "datatype": null,
> >       "derived": null
> >     },
> >     {
> >       "id": 5,
> >       "name": "RESOURCEFACT.PERIOD",
> >       "join": null,
> >       "hierarchy": null,
> >       "table": "RESOURCEFACT",
> >       "column": "PERIOD",
> >       "datatype": null,
> >       "derived": null
> >     },
> >     {
> >       "id": 6,
> >       "name": "RESOURCEFACT.VALUE",
> >       "join": null,
> >       "hierarchy": null,
> >       "table": "RESOURCEFACT",
> >       "column": "VALUE",
> >       "datatype": null,
> >       "derived": null
> >     },
> >     {
> >       "id": 7,
> >       "name": "RESOURCEFACT.AREAID",
> >       "join": null,
> >       "hierarchy": null,
> >       "table": "RESOURCEFACT",
> >       "column": "AREAID",
> >       "datatype": null,
> >       "derived": null
> >     },
> >     {
> >       "id": 8,
> >       "name": "RESOURCEFACT.CUSTOMERID",
> >       "join": null,
> >       "hierarchy": null,
> >       "table": "RESOURCEFACT",
> >       "column": "CUSTOMERID",
> >       "datatype": null,
> >       "derived": null
> >     },
> >     {
> >       "id": 9,
> >       "name": "USER_DERIVED",
> >       "join": {
> >         "type": "left",
> >         "primary_key": [
> >           "USERID"
> >         ],
> >         "foreign_key": [
> >           "CUSTOMERID"
> >         ]
> >       },
> >       "hierarchy": null,
> >       "table": "USER",
> >       "column": "{FK}",
> >       "datatype": null,
> >       "derived": [
> >         "USERID",
> >         "NAME"
> >       ]
> >     },
> >     {
> >       "id": 10,
> >       "name": "KPI_DERIVED",
> >       "join": {
> >         "type": "left",
> >         "primary_key": [
> >           "INDICATORID"
> >         ],
> >         "foreign_key": [
> >           "INDICATORID"
> >         ]
> >       },
> >       "hierarchy": null,
> >       "table": "KPI",
> >       "column": "{FK}",
> >       "datatype": null,
> >       "derived": [
> >         "INDICATORID",
> >         "NAME"
> >       ]
> >     },
> >     {
> >       "id": 11,
> >       "name": "AREA_DERIVED",
> >       "join": {
> >         "type": "left",
> >         "primary_key": [
> >           "AREAID"
> >         ],
> >         "foreign_key": [
> >           "AREAID"
> >         ]
> >       },
> >       "hierarchy": null,
> >       "table": "AREA",
> >       "column": "{FK}",
> >       "datatype": null,
> >       "derived": [
> >         "AREAID"
> >       ]
> >     }
> >   ],
> >   "measures": [
> >     {
> >       "id": 1,
> >       "name": "_COUNT_",
> >       "function": {
> >         "expression": "COUNT",
> >         "parameter": {
> >           "type": "constant",
> >           "value": "1"
> >         },
> >         "returntype": "bigint"
> >       },
> >       "dependent_measure_ref": null
> >     },
> >     {
> >       "id": 2,
> >       "name": "SUMVALUE",
> >       "function": {
> >         "expression": "SUM",
> >         "parameter": {
> >           "type": "column",
> >           "value": "VALUE"
> >         },
> >         "returntype": "bigint"
> >       },
> >       "dependent_measure_ref": null
> >     }
> >   ],
> >   "rowkey": {
> >     "rowkey_columns": [
> >       {
> >         "column": "NAME",
> >         "length": 0,
> >         "dictionary": "true",
> >         "mandatory": false
> >       },
> >       {
> >         "column": "COUNTRY",
> >         "length": 0,
> >         "dictionary": "true",
> >         "mandatory": false
> >       },
> >       {
> >         "column": "PROVINCE",
> >         "length": 0,
> >         "dictionary": "true",
> >         "mandatory": false
> >       },
> >       {
> >         "column": "CITY",
> >         "length": 0,
> >         "dictionary": "true",
> >         "mandatory": false
> >       },
> >       {
> >         "column": "RESOURCEID",
> >         "length": 0,
> >         "dictionary": "true",
> >         "mandatory": false
> >       },
> >       {
> >         "column": "INDICATORID",
> >         "length": 0,
> >         "dictionary": "true",
> >         "mandatory": false
> >       },
> >       {
> >         "column": "COLLECTIONTIME",
> >         "length": 0,
> >         "dictionary": "true",
> >         "mandatory": false
> >       },
> >       {
> >         "column": "PERIOD",
> >         "length": 0,
> >         "dictionary": "true",
> >         "mandatory": false
> >       },
> >       {
> >         "column": "VALUE",
> >         "length": 0,
> >         "dictionary": "true",
> >         "mandatory": false
> >       },
> >       {
> >         "column": "AREAID",
> >         "length": 0,
> >         "dictionary": "true",
> >         "mandatory": false
> >       },
> >       {
> >         "column": "CUSTOMERID",
> >         "length": 0,
> >         "dictionary": "true",
> >         "mandatory": false
> >       }
> >     ],
> >     "aggregation_groups": [
> >       [
> >         "RESOURCEID",
> >         "INDICATORID",
> >         "COLLECTIONTIME",
> >         "PERIOD",
> >         "VALUE",
> >         "AREAID",
> >         "CUSTOMERID"
> >       ],
> >       [
> >         "NAME",
> >         "COUNTRY",
> >         "PROVINCE",
> >         "CITY"
> >       ]
> >     ]
> >   },
> >   "signature": "1CSqztliEqN4l0cOPrkFdA==",
> >   "capacity": "MEDIUM",
> >   "last_modified": 1425587998627,
> >   "fact_table": "RESOURCEFACT",
> >   "null_string": null,
> >   "filter_condition": null,
> >   "cube_partition_desc": {
> >     "partition_date_column": null,
> >     "partition_date_start": 0,
> >     "cube_partition_type": "APPEND"
> >   },
> >   "hbase_mapping": {
> >     "column_family": [
> >       {
> >         "name": "F1",
> >         "columns": [
> >           {
> >             "qualifier": "M",
> >             "measure_refs": [
> >               "_COUNT_",
> >               "SUMVALUE"
> >             ]
> >           }
> >         ]
> >       }
> >     ]
> >   },
> >   "notify_list": []
> > }
> >
> > Regards,
> > Santosh Akhilesh
> > Bangalore R&D
> > HUAWEI TECHNOLOGIES CO.,LTD.
> >
> > www.huawei.com
> >
> > ------------------------------------------------------------
> -------------------------------------------------------------------------
> > This e-mail and its attachments contain confidential information from
> > HUAWEI, which
> > is intended only for the person or entity whose address is listed above.
> > Any use of the
> > information contained herein in any way (including, but not limited to,
> > total or partial
> > disclosure, reproduction, or dissemination) by persons other than the
> > intended
> > recipient(s) is prohibited. If you receive this e-mail in error, please
> > notify the sender by
> > phone or email immediately and delete it!
> >
> > ________________________________________
> > From: Santoshakhilesh
> > Sent: Thursday, March 05, 2015 8:36 PM
> > To: [email protected]
> > Subject: RE: get wrong result for the query(both in kyin-0.71 and
> > kylin-0.6x)
> >
> > hi dong ,
> >   I am facing same issue when I run a count(*) on a query I get correct
> > result
> >   But when I do select * on same query I get only 1 result , but there
> are
> > 1000 records as shown by client.
> >
> > I am not getting following notifications  , I am using 0.6.5
> >
> > *Note: Current results are partial, please click 'Show All' button to get
> > all results.*
> > Click the "Show All" button to disable the "AcceptPartialResults" mode,
> and
> > you'll get a right result.
> >
> >
> > Regards,
> > Santosh Akhilesh
> > Bangalore R&D
> > HUAWEI TECHNOLOGIES CO.,LTD.
> >
> > www.huawei.com
> >
> > ------------------------------------------------------------
> -------------------------------------------------------------------------
> > This e-mail and its attachments contain confidential information from
> > HUAWEI, which
> > is intended only for the person or entity whose address is listed above.
> > Any use of the
> > information contained herein in any way (including, but not limited to,
> > total or partial
> > disclosure, reproduction, or dissemination) by persons other than the
> > intended
> > recipient(s) is prohibited. If you receive this e-mail in error, please
> > notify the sender by
> > phone or email immediately and delete it!
> >
> > ________________________________________
> > From: hongbin ma [[email protected]]
> > Sent: Thursday, March 05, 2015 11:55 AM
> > To: [email protected]
> > Subject: Re: get wrong result for the query(both in kyin-0.71 and
> > kylin-0.6x)
> >
> > hi dong,
> >
> > we looked into it, here's the reason:
> >
> > By default if you're making queries on the web client, a mode called
> > "AcceptPartialResults" is enabled, this is a protection mechanism that
> > will only return part of the results to reduce server overhead. Honestly
> it
> > might hurt the correctness of order by queries.
> >
> > If you're seeking 100% correctness, after running the query you will
> find a
> > notification:
> > *Note: Current results are partial, please click 'Show All' button to get
> > all results.*
> > Click the "Show All" button to disable the "AcceptPartialResults" mode,
> and
> > you'll get a right result.
> >
> > Notice "AcceptPartialResults" is only enabled by default at web client,
> > you'll not meet such problems if you're using JDBC, ODBC or standard REST
> > API.
> >
> > thanks
> > hongbin
> >
> >
> > On Thu, Mar 5, 2015 at 11:33 AM, dong wang <[email protected]>
> wrote:
> >
> > > Sorry hongbin, it's my fault, for the 2 question, the row count is LESS
> > > THAN 50000
> > >
> > > 2015-03-05 11:28 GMT+08:00 dong wang <[email protected]>:
> > >
> > > > HI hongbin,
> > > > 1, yes, as you guessed,  the result of the above query is CORRECT if
> we
> > > > use ASC order, thus, the issue only occurs when DESC order.
> > > > 2, the total row count of the result is GREATER THAN 50000
> definitely,
> > > > even much more
> > > >
> > > > 2015-03-05 11:06 GMT+08:00 hongbin ma <[email protected]>:
> > > >
> > > >> for #2 to #4, will Kylin return correct answers if order by
> ascending
> > > >> order?
> > > >> And what it the total result count of #4?
> > > >>
> > > >> On Thu, Mar 5, 2015 at 10:56 AM, Zhou, Qianhao <[email protected]>
> > > wrote:
> > > >>
> > > >> > #1 when add condition for date type, you should add date before
> the
> > > time
> > > >> > string
> > > >> >
> > > >> > the sql should be like:
> > > >> > select my_date, sum(f1), sum(f2), sum(f3) from test where
> > > >> > my_date >= date'2013-01-01' and my_date <= date'2015-01-31'
> > > >> >
> > > >> >
> > > >> >
> > > >> > Best Regard
> > > >> > Zhou QianHao
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> > On 3/5/15, 10:40 AM, "dong wang" <[email protected]> wrote:
> > > >> >
> > > >> > >Hi all, I have experienced the wrong result for quite a long time
> > > since
> > > >> > >kylin-0.6x, thus, really hope that someone can do a help.
> > > >> > >
> > > >> > >CUBE:
> > > >> > >1, the cube is based on just one big fact table. there are 10
> > > >> attributes,
> > > >> > >and 3 metrics.
> > > >> > >for exmaple: my_date, col2, col3, col4, col5, col6, col7, col8,
> > col9,
> > > >> > >col10, sum(f1), sum(f2), sum(f2)
> > > >> > >
> > > >> > >2, my_date, col2 is normal columns, and my_date is the partition
> > > >> > >attribute,
> > > >> > >which is date type in HIVE,
> > > >> > >
> > > >> > >3, (col3, col4), (col5, col6), (col7, col8), (col9, col10) are 4
> > > >> > >hirerarchies, and col3, col5, col7, col9 are parents of its
> > hirarchy
> > > >> > >respectively.
> > > >> > >
> > > >> > >4, the aggregation groups are: my_date, col2, col3, col4, col5,
> > col6,
> > > >> > >col7,
> > > >> > >col8, col9, col10
> > > >> > >
> > > >> > >5, the rowkeys are: my_date, col2, col3, col4, col5, col6, col7,
> > > col8,
> > > >> > >col9, col10
> > > >> > >
> > > >> > >
> > > >> > >NOTE THAT: in the cube, the range of my_date is between
> 2013-10-01
> > > and
> > > >> > >2015-03-04
> > > >> > >
> > > >> > >
> > > >> > >2 kinds of issue as below:
> > > >> > >
> > > >> > >QUERY:
> > > >> > >1, select my_date, sum(f1), sum(f2), sum(f3) from test where
> > > >> > >my_date>='2013-01-01' and my_date<= '2015-01-31'
> > > >> > >note "my_date" is HIVE date type. and the error: Cannot apply
> '>='
> > to
> > > >> > >arguments of type '<DATE> >= <CHAR(10)>'. Supported form(s):
> > > >> > >'<COMPARABLE_TYPE> >= <COMPARABLE_TYPE>' while executing SQL, how
> > > >> should I
> > > >> > >modify the SQL?
> > > >> > >
> > > >> > >2, select my_date, col3, col4, sum(f1), sum(f2), sum(f3) from
> test
> > > >> group
> > > >> > >by
> > > >> > >my_date, col3, col4, order by my_date desc, col3 desc, col4 desc;
> > > >> > >
> > > >> > >suppose that the actual row count of the result is GREATER THAN
> the
> > > >> > >default
> > > >> > >limit 50000;
> > > >> > >
> > > >> > >then, the result is:
> > > >> > >
> > > >> > >my_date, col3, col4, sum(f1), sum(f2), sum(f3)
> > > >> > >2013-10-23
> > > >> > >2013-10-23
> > > >> > >...
> > > >> > >2013-10-22
> > > >> > >2013-10-22
> > > >> > >...
> > > >> > >2013-10-21
> > > >> > >2013-10-21
> > > >> > >...
> > > >> > >...
> > > >> > >...
> > > >> > >2013-10-01
> > > >> > >2013-10-01(note 2013-10-01 is the earliest day of the cube)
> > > >> > >
> > > >> > >totally, there are 50000 rows for the result,  but it is obvious
> > that
> > > >> the
> > > >> > >order by and limit doesn't work correctly.
> > > >> > >
> > > >> > >3, select my_date, col3, col4, sum(f1), sum(f2), sum(f3) from
> test
> > > >> group
> > > >> > >by
> > > >> > >my_date, col3, col4, order by my_date desc, col3 desc, col4 desc
> > > LIMIT
> > > >> 100
> > > >> > >
> > > >> > >just adding "LIMIT 100" to the tail of the 2nd query above, then,
> > the
> > > >> > >result is:
> > > >> > >
> > > >> > >my_date, col3, col4, sum(f1), sum(f2), sum(f3)
> > > >> > >2013-10-05
> > > >> > >2013-10-05
> > > >> > >....
> > > >> > >2013-10-05
> > > >> > >
> > > >> > >totally, there are 100 rows. all of which are "2013-10-05",
> > > apparently,
> > > >> > >the
> > > >> > >result is not correct as well.
> > > >> > >
> > > >> > >4, select my_date, sum(f1), sum(f2), sum(f3) from test group by
> > > my_date
> > > >> > >order by my_date desc LIMIT 100
> > > >> > >if I just add "my_date" as "aggreation level" as above, the
> result
> > of
> > > >> the
> > > >> > >4th query is correct.
> > > >> >
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>

Re: get wrong result for the query(both in kyin-0.71 and kylin-0.6x)

Reply via email to