Re: get wrong result for the query(both in kyin-0.71 and kylin-0.6x)

Santosh Akhilesh Thu, 05 Mar 2015 18:50:25 -0800

Hi Hongbin,
If you see my queries I am doing left join dim tables with fact and if I do
a count on it I get correct result. But if I try to select each column in
join to get record out I get only one result. Actually my query is not on
only select * from fact..
So do you mean to say such queries are not supported in kylin 0.6.
Regards
Santosh Akhilesh
On Fri, 6 Mar 2015 at 8:07 am, Santosh Akhilesh <[email protected]>
wrote:


> Hi Hongbin,
> Ok, I will add group by and check. But can you please also check why my
> dim table user is not being recognised in query. And also I can't query if
> I add the column value which is in fact and I have added a measure over it.
> I had sent my cube request Json.
> Regards
> Santosh
> On Fri, 6 Mar 2015 at 7:50 am, hongbin ma <[email protected]> wrote:
>
>> hi santoshakhilesh,
>>
>> In 0.6 versions, Kylin is only meant for aggregated queries, not detailed
>> level queries. Try modifying your query by adding some groupbys.
>>
>> I believe "select * from factable" will work for your case.
>> The reason why you can support "select * from facttable" is that when
>> Kylin
>> detects such query pattern, it will secretly add all the dimension columns
>> and metric columns on fact table to the query and the query will be
>> treated
>> like:
>>
>> select dim1,dim2,dim3.... dimN, metric1,metric2...metricN from facttable
>> groupby dim1, dim2, dim3,,,dimN
>>
>> So the result of "select * from factable" is aggregated if two rows in
>> fact
>> table share the same dimension values.
>>
>> In 0.7 version which we introduced a new module called "inverted index",
>> with data organized in "inverted index" Kylin can better answer detailed
>> level queries (By MPP instead of cubing). However this is still under
>> development and not ready for community use.
>>
>> thanks
>> hongbin
>>
>>
>> On Thu, Mar 5, 2015 at 11:34 PM, Santoshakhilesh <
>> [email protected]> wrote:
>>
>> > Hi All ,
>> >
>> > I am facing couple of issue regarding my queries in 0.6
>> >
>> > 1. Query result in web ui is not correct , following return only one
>> > record , if I run count(*) on same join
>> >     result is 1000 which is correct
>> > SELECT
>> > AREA.NAME
>> > ,AREA.COUNTRY
>> > ,AREA.PROVINCE
>> > ,AREA.CITY
>> > ,RESOURCEFACT.RESOURCEID
>> > ,RESOURCEFACT.INDICATORID
>> > ,RESOURCEFACT.COLLECTIONTIME
>> > ,RESOURCEFACT.PERIOD
>> > ,RESOURCEFACT.AREAID
>> > ,RESOURCEFACT.CUSTOMERID
>> > FROM RESOURCEFACT
>> > LEFT JOIN AREA
>> > ON RESOURCEFACT.AREAID = AREA.AREAID
>> > LEFT JOIN KPI
>> > ON RESOURCEFACT.INDICATORID = KPI.INDICATORID
>> >
>> > 2. The sample query on cube wizard is following; I have another table
>> User
>> > when I run this query after cube build Kylin throws exception that it
>> cant
>> > find user table. even select count(*) on User table fails. But the
>> table is
>> > there in hive. RESOURCEFACT.VALUE this column cant be recognized by sql
>> > query on web ui.
>> >
>> > SELECT
>> > AREA.NAME
>> > ,AREA.COUNTRY
>> > ,AREA.PROVINCE
>> > ,AREA.CITY
>> > ,RESOURCEFACT.RESOURCEID
>> > ,RESOURCEFACT.INDICATORID
>> > ,RESOURCEFACT.COLLECTIONTIME
>> > ,RESOURCEFACT.PERIOD
>> > ,RESOURCEFACT.VALUE
>> > ,RESOURCEFACT.AREAID
>> > ,RESOURCEFACT.CUSTOMERID
>> > FROM RESOURCEFACT
>> > LEFT JOIN AREA
>> > ON RESOURCEFACT.AREAID = AREA.AREAID
>> > LEFT JOIN USER
>> > ON RESOURCEFACT.CUSTOMERID = USER.USERID
>> > LEFT JOIN KPI
>> > ON RESOURCEFACT.INDICATORID = KPI.INDICATORID
>> >
>> > MY Cube Json is as below.
>> > {
>> >   "uuid": "95204f18-24f7-43e5-bead-368900f74f9c",
>> >   "name": "secondcube",
>> >   "description": "",
>> >   "dimensions": [
>> >     {
>> >       "id": 1,
>> >       "name": "RESOURCEGROUP",
>> >       "join": {
>> >         "type": "left",
>> >         "primary_key": [
>> >           "AREAID"
>> >         ],
>> >         "foreign_key": [
>> >           "AREAID"
>> >         ]
>> >       },
>> >       "hierarchy": [
>> >         {
>> >           "level": "1",
>> >           "column": "NAME"
>> >         },
>> >         {
>> >           "level": "2",
>> >           "column": "COUNTRY"
>> >         },
>> >         {
>> >           "level": "3",
>> >           "column": "PROVINCE"
>> >         },
>> >         {
>> >           "level": "4",
>> >           "column": "CITY"
>> >         }
>> >       ],
>> >       "table": "AREA",
>> >       "column": null,
>> >       "datatype": null,
>> >       "derived": null
>> >     },
>> >     {
>> >       "id": 2,
>> >       "name": "RESOURCEFACT.RESOURCEID",
>> >       "join": null,
>> >       "hierarchy": null,
>> >       "table": "RESOURCEFACT",
>> >       "column": "RESOURCEID",
>> >       "datatype": null,
>> >       "derived": null
>> >     },
>> >     {
>> >       "id": 3,
>> >       "name": "RESOURCEFACT.INDICATORID",
>> >       "join": null,
>> >       "hierarchy": null,
>> >       "table": "RESOURCEFACT",
>> >       "column": "INDICATORID",
>> >       "datatype": null,
>> >       "derived": null
>> >     },
>> >     {
>> >       "id": 4,
>> >       "name": "RESOURCEFACT.COLLECTIONTIME",
>> >       "join": null,
>> >       "hierarchy": null,
>> >       "table": "RESOURCEFACT",
>> >       "column": "COLLECTIONTIME",
>> >       "datatype": null,
>> >       "derived": null
>> >     },
>> >     {
>> >       "id": 5,
>> >       "name": "RESOURCEFACT.PERIOD",
>> >       "join": null,
>> >       "hierarchy": null,
>> >       "table": "RESOURCEFACT",
>> >       "column": "PERIOD",
>> >       "datatype": null,
>> >       "derived": null
>> >     },
>> >     {
>> >       "id": 6,
>> >       "name": "RESOURCEFACT.VALUE",
>> >       "join": null,
>> >       "hierarchy": null,
>> >       "table": "RESOURCEFACT",
>> >       "column": "VALUE",
>> >       "datatype": null,
>> >       "derived": null
>> >     },
>> >     {
>> >       "id": 7,
>> >       "name": "RESOURCEFACT.AREAID",
>> >       "join": null,
>> >       "hierarchy": null,
>> >       "table": "RESOURCEFACT",
>> >       "column": "AREAID",
>> >       "datatype": null,
>> >       "derived": null
>> >     },
>> >     {
>> >       "id": 8,
>> >       "name": "RESOURCEFACT.CUSTOMERID",
>> >       "join": null,
>> >       "hierarchy": null,
>> >       "table": "RESOURCEFACT",
>> >       "column": "CUSTOMERID",
>> >       "datatype": null,
>> >       "derived": null
>> >     },
>> >     {
>> >       "id": 9,
>> >       "name": "USER_DERIVED",
>> >       "join": {
>> >         "type": "left",
>> >         "primary_key": [
>> >           "USERID"
>> >         ],
>> >         "foreign_key": [
>> >           "CUSTOMERID"
>> >         ]
>> >       },
>> >       "hierarchy": null,
>> >       "table": "USER",
>> >       "column": "{FK}",
>> >       "datatype": null,
>> >       "derived": [
>> >         "USERID",
>> >         "NAME"
>> >       ]
>> >     },
>> >     {
>> >       "id": 10,
>> >       "name": "KPI_DERIVED",
>> >       "join": {
>> >         "type": "left",
>> >         "primary_key": [
>> >           "INDICATORID"
>> >         ],
>> >         "foreign_key": [
>> >           "INDICATORID"
>> >         ]
>> >       },
>> >       "hierarchy": null,
>> >       "table": "KPI",
>> >       "column": "{FK}",
>> >       "datatype": null,
>> >       "derived": [
>> >         "INDICATORID",
>> >         "NAME"
>> >       ]
>> >     },
>> >     {
>> >       "id": 11,
>> >       "name": "AREA_DERIVED",
>> >       "join": {
>> >         "type": "left",
>> >         "primary_key": [
>> >           "AREAID"
>> >         ],
>> >         "foreign_key": [
>> >           "AREAID"
>> >         ]
>> >       },
>> >       "hierarchy": null,
>> >       "table": "AREA",
>> >       "column": "{FK}",
>> >       "datatype": null,
>> >       "derived": [
>> >         "AREAID"
>> >       ]
>> >     }
>> >   ],
>> >   "measures": [
>> >     {
>> >       "id": 1,
>> >       "name": "_COUNT_",
>> >       "function": {
>> >         "expression": "COUNT",
>> >         "parameter": {
>> >           "type": "constant",
>> >           "value": "1"
>> >         },
>> >         "returntype": "bigint"
>> >       },
>> >       "dependent_measure_ref": null
>> >     },
>> >     {
>> >       "id": 2,
>> >       "name": "SUMVALUE",
>> >       "function": {
>> >         "expression": "SUM",
>> >         "parameter": {
>> >           "type": "column",
>> >           "value": "VALUE"
>> >         },
>> >         "returntype": "bigint"
>> >       },
>> >       "dependent_measure_ref": null
>> >     }
>> >   ],
>> >   "rowkey": {
>> >     "rowkey_columns": [
>> >       {
>> >         "column": "NAME",
>> >         "length": 0,
>> >         "dictionary": "true",
>> >         "mandatory": false
>> >       },
>> >       {
>> >         "column": "COUNTRY",
>> >         "length": 0,
>> >         "dictionary": "true",
>> >         "mandatory": false
>> >       },
>> >       {
>> >         "column": "PROVINCE",
>> >         "length": 0,
>> >         "dictionary": "true",
>> >         "mandatory": false
>> >       },
>> >       {
>> >         "column": "CITY",
>> >         "length": 0,
>> >         "dictionary": "true",
>> >         "mandatory": false
>> >       },
>> >       {
>> >         "column": "RESOURCEID",
>> >         "length": 0,
>> >         "dictionary": "true",
>> >         "mandatory": false
>> >       },
>> >       {
>> >         "column": "INDICATORID",
>> >         "length": 0,
>> >         "dictionary": "true",
>> >         "mandatory": false
>> >       },
>> >       {
>> >         "column": "COLLECTIONTIME",
>> >         "length": 0,
>> >         "dictionary": "true",
>> >         "mandatory": false
>> >       },
>> >       {
>> >         "column": "PERIOD",
>> >         "length": 0,
>> >         "dictionary": "true",
>> >         "mandatory": false
>> >       },
>> >       {
>> >         "column": "VALUE",
>> >         "length": 0,
>> >         "dictionary": "true",
>> >         "mandatory": false
>> >       },
>> >       {
>> >         "column": "AREAID",
>> >         "length": 0,
>> >         "dictionary": "true",
>> >         "mandatory": false
>> >       },
>> >       {
>> >         "column": "CUSTOMERID",
>> >         "length": 0,
>> >         "dictionary": "true",
>> >         "mandatory": false
>> >       }
>> >     ],
>> >     "aggregation_groups": [
>> >       [
>> >         "RESOURCEID",
>> >         "INDICATORID",
>> >         "COLLECTIONTIME",
>> >         "PERIOD",
>> >         "VALUE",
>> >         "AREAID",
>> >         "CUSTOMERID"
>> >       ],
>> >       [
>> >         "NAME",
>> >         "COUNTRY",
>> >         "PROVINCE",
>> >         "CITY"
>> >       ]
>> >     ]
>> >   },
>> >   "signature": "1CSqztliEqN4l0cOPrkFdA==",
>> >   "capacity": "MEDIUM",
>> >   "last_modified": 1425587998627,
>> >   "fact_table": "RESOURCEFACT",
>> >   "null_string": null,
>> >   "filter_condition": null,
>> >   "cube_partition_desc": {
>> >     "partition_date_column": null,
>> >     "partition_date_start": 0,
>> >     "cube_partition_type": "APPEND"
>> >   },
>> >   "hbase_mapping": {
>> >     "column_family": [
>> >       {
>> >         "name": "F1",
>> >         "columns": [
>> >           {
>> >             "qualifier": "M",
>> >             "measure_refs": [
>> >               "_COUNT_",
>> >               "SUMVALUE"
>> >             ]
>> >           }
>> >         ]
>> >       }
>> >     ]
>> >   },
>> >   "notify_list": []
>> > }
>> >
>> > Regards,
>> > Santosh Akhilesh
>> > Bangalore R&D
>> > HUAWEI TECHNOLOGIES CO.,LTD.
>> >
>> > www.huawei.com
>> >
>> > ------------------------------------------------------------
>> -------------------------------------------------------------------------
>> > This e-mail and its attachments contain confidential information from
>> > HUAWEI, which
>> > is intended only for the person or entity whose address is listed above.
>> > Any use of the
>> > information contained herein in any way (including, but not limited to,
>> > total or partial
>> > disclosure, reproduction, or dissemination) by persons other than the
>> > intended
>> > recipient(s) is prohibited. If you receive this e-mail in error, please
>> > notify the sender by
>> > phone or email immediately and delete it!
>> >
>> > ________________________________________
>> > From: Santoshakhilesh
>> > Sent: Thursday, March 05, 2015 8:36 PM
>> > To: [email protected]
>> > Subject: RE: get wrong result for the query(both in kyin-0.71 and
>> > kylin-0.6x)
>> >
>> > hi dong ,
>> >   I am facing same issue when I run a count(*) on a query I get correct
>> > result
>> >   But when I do select * on same query I get only 1 result , but there
>> are
>> > 1000 records as shown by client.
>> >
>> > I am not getting following notifications  , I am using 0.6.5
>> >
>> > *Note: Current results are partial, please click 'Show All' button to
>> get
>> > all results.*
>> > Click the "Show All" button to disable the "AcceptPartialResults" mode,
>> and
>> > you'll get a right result.
>> >
>> >
>> > Regards,
>> > Santosh Akhilesh
>> > Bangalore R&D
>> > HUAWEI TECHNOLOGIES CO.,LTD.
>> >
>> > www.huawei.com
>> >
>> > ------------------------------------------------------------
>> -------------------------------------------------------------------------
>> > This e-mail and its attachments contain confidential information from
>> > HUAWEI, which
>> > is intended only for the person or entity whose address is listed above.
>> > Any use of the
>> > information contained herein in any way (including, but not limited to,
>> > total or partial
>> > disclosure, reproduction, or dissemination) by persons other than the
>> > intended
>> > recipient(s) is prohibited. If you receive this e-mail in error, please
>> > notify the sender by
>> > phone or email immediately and delete it!
>> >
>> > ________________________________________
>> > From: hongbin ma [[email protected]]
>> > Sent: Thursday, March 05, 2015 11:55 AM
>> > To: [email protected]
>> > Subject: Re: get wrong result for the query(both in kyin-0.71 and
>> > kylin-0.6x)
>> >
>> > hi dong,
>> >
>> > we looked into it, here's the reason:
>> >
>> > By default if you're making queries on the web client, a mode called
>> > "AcceptPartialResults" is enabled, this is a protection mechanism that
>> > will only return part of the results to reduce server overhead.
>> Honestly it
>> > might hurt the correctness of order by queries.
>> >
>> > If you're seeking 100% correctness, after running the query you will
>> find a
>> > notification:
>> > *Note: Current results are partial, please click 'Show All' button to
>> get
>> > all results.*
>> > Click the "Show All" button to disable the "AcceptPartialResults" mode,
>> and
>> > you'll get a right result.
>> >
>> > Notice "AcceptPartialResults" is only enabled by default at web client,
>> > you'll not meet such problems if you're using JDBC, ODBC or standard
>> REST
>> > API.
>> >
>> > thanks
>> > hongbin
>> >
>> >
>> > On Thu, Mar 5, 2015 at 11:33 AM, dong wang <[email protected]>
>> wrote:
>> >
>> > > Sorry hongbin, it's my fault, for the 2 question, the row count is
>> LESS
>> > > THAN 50000
>> > >
>> > > 2015-03-05 11:28 GMT+08:00 dong wang <[email protected]>:
>> > >
>> > > > HI hongbin,
>> > > > 1, yes, as you guessed,  the result of the above query is CORRECT
>> if we
>> > > > use ASC order, thus, the issue only occurs when DESC order.
>> > > > 2, the total row count of the result is GREATER THAN 50000
>> definitely,
>> > > > even much more
>> > > >
>> > > > 2015-03-05 11:06 GMT+08:00 hongbin ma <[email protected]>:
>> > > >
>> > > >> for #2 to #4, will Kylin return correct answers if order by
>> ascending
>> > > >> order?
>> > > >> And what it the total result count of #4?
>> > > >>
>> > > >> On Thu, Mar 5, 2015 at 10:56 AM, Zhou, Qianhao <[email protected]>
>> > > wrote:
>> > > >>
>> > > >> > #1 when add condition for date type, you should add date before
>> the
>> > > time
>> > > >> > string
>> > > >> >
>> > > >> > the sql should be like:
>> > > >> > select my_date, sum(f1), sum(f2), sum(f3) from test where
>> > > >> > my_date >= date'2013-01-01' and my_date <= date'2015-01-31'
>> > > >> >
>> > > >> >
>> > > >> >
>> > > >> > Best Regard
>> > > >> > Zhou QianHao
>> > > >> >
>> > > >> >
>> > > >> >
>> > > >> >
>> > > >> >
>> > > >> > On 3/5/15, 10:40 AM, "dong wang" <[email protected]> wrote:
>> > > >> >
>> > > >> > >Hi all, I have experienced the wrong result for quite a long
>> time
>> > > since
>> > > >> > >kylin-0.6x, thus, really hope that someone can do a help.
>> > > >> > >
>> > > >> > >CUBE:
>> > > >> > >1, the cube is based on just one big fact table. there are 10
>> > > >> attributes,
>> > > >> > >and 3 metrics.
>> > > >> > >for exmaple: my_date, col2, col3, col4, col5, col6, col7, col8,
>> > col9,
>> > > >> > >col10, sum(f1), sum(f2), sum(f2)
>> > > >> > >
>> > > >> > >2, my_date, col2 is normal columns, and my_date is the partition
>> > > >> > >attribute,
>> > > >> > >which is date type in HIVE,
>> > > >> > >
>> > > >> > >3, (col3, col4), (col5, col6), (col7, col8), (col9, col10) are 4
>> > > >> > >hirerarchies, and col3, col5, col7, col9 are parents of its
>> > hirarchy
>> > > >> > >respectively.
>> > > >> > >
>> > > >> > >4, the aggregation groups are: my_date, col2, col3, col4, col5,
>> > col6,
>> > > >> > >col7,
>> > > >> > >col8, col9, col10
>> > > >> > >
>> > > >> > >5, the rowkeys are: my_date, col2, col3, col4, col5, col6, col7,
>> > > col8,
>> > > >> > >col9, col10
>> > > >> > >
>> > > >> > >
>> > > >> > >NOTE THAT: in the cube, the range of my_date is between
>> 2013-10-01
>> > > and
>> > > >> > >2015-03-04
>> > > >> > >
>> > > >> > >
>> > > >> > >2 kinds of issue as below:
>> > > >> > >
>> > > >> > >QUERY:
>> > > >> > >1, select my_date, sum(f1), sum(f2), sum(f3) from test where
>> > > >> > >my_date>='2013-01-01' and my_date<= '2015-01-31'
>> > > >> > >note "my_date" is HIVE date type. and the error: Cannot apply
>> '>='
>> > to
>> > > >> > >arguments of type '<DATE> >= <CHAR(10)>'. Supported form(s):
>> > > >> > >'<COMPARABLE_TYPE> >= <COMPARABLE_TYPE>' while executing SQL,
>> how
>> > > >> should I
>> > > >> > >modify the SQL?
>> > > >> > >
>> > > >> > >2, select my_date, col3, col4, sum(f1), sum(f2), sum(f3) from
>> test
>> > > >> group
>> > > >> > >by
>> > > >> > >my_date, col3, col4, order by my_date desc, col3 desc, col4
>> desc;
>> > > >> > >
>> > > >> > >suppose that the actual row count of the result is GREATER THAN
>> the
>> > > >> > >default
>> > > >> > >limit 50000;
>> > > >> > >
>> > > >> > >then, the result is:
>> > > >> > >
>> > > >> > >my_date, col3, col4, sum(f1), sum(f2), sum(f3)
>> > > >> > >2013-10-23
>> > > >> > >2013-10-23
>> > > >> > >...
>> > > >> > >2013-10-22
>> > > >> > >2013-10-22
>> > > >> > >...
>> > > >> > >2013-10-21
>> > > >> > >2013-10-21
>> > > >> > >...
>> > > >> > >...
>> > > >> > >...
>> > > >> > >2013-10-01
>> > > >> > >2013-10-01(note 2013-10-01 is the earliest day of the cube)
>> > > >> > >
>> > > >> > >totally, there are 50000 rows for the result,  but it is obvious
>> > that
>> > > >> the
>> > > >> > >order by and limit doesn't work correctly.
>> > > >> > >
>> > > >> > >3, select my_date, col3, col4, sum(f1), sum(f2), sum(f3) from
>> test
>> > > >> group
>> > > >> > >by
>> > > >> > >my_date, col3, col4, order by my_date desc, col3 desc, col4 desc
>> > > LIMIT
>> > > >> 100
>> > > >> > >
>> > > >> > >just adding "LIMIT 100" to the tail of the 2nd query above,
>> then,
>> > the
>> > > >> > >result is:
>> > > >> > >
>> > > >> > >my_date, col3, col4, sum(f1), sum(f2), sum(f3)
>> > > >> > >2013-10-05
>> > > >> > >2013-10-05
>> > > >> > >....
>> > > >> > >2013-10-05
>> > > >> > >
>> > > >> > >totally, there are 100 rows. all of which are "2013-10-05",
>> > > apparently,
>> > > >> > >the
>> > > >> > >result is not correct as well.
>> > > >> > >
>> > > >> > >4, select my_date, sum(f1), sum(f2), sum(f3) from test group by
>> > > my_date
>> > > >> > >order by my_date desc LIMIT 100
>> > > >> > >if I just add "my_date" as "aggreation level" as above, the
>> result
>> > of
>> > > >> the
>> > > >> > >4th query is correct.
>> > > >> >
>> > > >> >
>> > > >>
>> > > >
>> > > >
>> > >
>> >
>>
>

Re: get wrong result for the query(both in kyin-0.71 and kylin-0.6x)

Reply via email to