hi santoshakhilesh,

In 0.6 versions, Kylin is only meant for aggregated queries, not detailed
level queries. Try modifying your query by adding some groupbys.

I believe "select * from factable" will work for your case.
The reason why you can support "select * from facttable" is that when Kylin
detects such query pattern, it will secretly add all the dimension columns
and metric columns on fact table to the query and the query will be treated
like:

select dim1,dim2,dim3.... dimN, metric1,metric2...metricN from facttable
groupby dim1, dim2, dim3,,,dimN

So the result of "select * from factable" is aggregated if two rows in fact
table share the same dimension values.

In 0.7 version which we introduced a new module called "inverted index",
with data organized in "inverted index" Kylin can better answer detailed
level queries (By MPP instead of cubing). However this is still under
development and not ready for community use.

thanks
hongbin


On Thu, Mar 5, 2015 at 11:34 PM, Santoshakhilesh <
[email protected]> wrote:

> Hi All ,
>
> I am facing couple of issue regarding my queries in 0.6
>
> 1. Query result in web ui is not correct , following return only one
> record , if I run count(*) on same join
>     result is 1000 which is correct
> SELECT
> AREA.NAME
> ,AREA.COUNTRY
> ,AREA.PROVINCE
> ,AREA.CITY
> ,RESOURCEFACT.RESOURCEID
> ,RESOURCEFACT.INDICATORID
> ,RESOURCEFACT.COLLECTIONTIME
> ,RESOURCEFACT.PERIOD
> ,RESOURCEFACT.AREAID
> ,RESOURCEFACT.CUSTOMERID
> FROM RESOURCEFACT
> LEFT JOIN AREA
> ON RESOURCEFACT.AREAID = AREA.AREAID
> LEFT JOIN KPI
> ON RESOURCEFACT.INDICATORID = KPI.INDICATORID
>
> 2. The sample query on cube wizard is following; I have another table User
> when I run this query after cube build Kylin throws exception that it cant
> find user table. even select count(*) on User table fails. But the table is
> there in hive. RESOURCEFACT.VALUE this column cant be recognized by sql
> query on web ui.
>
> SELECT
> AREA.NAME
> ,AREA.COUNTRY
> ,AREA.PROVINCE
> ,AREA.CITY
> ,RESOURCEFACT.RESOURCEID
> ,RESOURCEFACT.INDICATORID
> ,RESOURCEFACT.COLLECTIONTIME
> ,RESOURCEFACT.PERIOD
> ,RESOURCEFACT.VALUE
> ,RESOURCEFACT.AREAID
> ,RESOURCEFACT.CUSTOMERID
> FROM RESOURCEFACT
> LEFT JOIN AREA
> ON RESOURCEFACT.AREAID = AREA.AREAID
> LEFT JOIN USER
> ON RESOURCEFACT.CUSTOMERID = USER.USERID
> LEFT JOIN KPI
> ON RESOURCEFACT.INDICATORID = KPI.INDICATORID
>
> MY Cube Json is as below.
> {
>   "uuid": "95204f18-24f7-43e5-bead-368900f74f9c",
>   "name": "secondcube",
>   "description": "",
>   "dimensions": [
>     {
>       "id": 1,
>       "name": "RESOURCEGROUP",
>       "join": {
>         "type": "left",
>         "primary_key": [
>           "AREAID"
>         ],
>         "foreign_key": [
>           "AREAID"
>         ]
>       },
>       "hierarchy": [
>         {
>           "level": "1",
>           "column": "NAME"
>         },
>         {
>           "level": "2",
>           "column": "COUNTRY"
>         },
>         {
>           "level": "3",
>           "column": "PROVINCE"
>         },
>         {
>           "level": "4",
>           "column": "CITY"
>         }
>       ],
>       "table": "AREA",
>       "column": null,
>       "datatype": null,
>       "derived": null
>     },
>     {
>       "id": 2,
>       "name": "RESOURCEFACT.RESOURCEID",
>       "join": null,
>       "hierarchy": null,
>       "table": "RESOURCEFACT",
>       "column": "RESOURCEID",
>       "datatype": null,
>       "derived": null
>     },
>     {
>       "id": 3,
>       "name": "RESOURCEFACT.INDICATORID",
>       "join": null,
>       "hierarchy": null,
>       "table": "RESOURCEFACT",
>       "column": "INDICATORID",
>       "datatype": null,
>       "derived": null
>     },
>     {
>       "id": 4,
>       "name": "RESOURCEFACT.COLLECTIONTIME",
>       "join": null,
>       "hierarchy": null,
>       "table": "RESOURCEFACT",
>       "column": "COLLECTIONTIME",
>       "datatype": null,
>       "derived": null
>     },
>     {
>       "id": 5,
>       "name": "RESOURCEFACT.PERIOD",
>       "join": null,
>       "hierarchy": null,
>       "table": "RESOURCEFACT",
>       "column": "PERIOD",
>       "datatype": null,
>       "derived": null
>     },
>     {
>       "id": 6,
>       "name": "RESOURCEFACT.VALUE",
>       "join": null,
>       "hierarchy": null,
>       "table": "RESOURCEFACT",
>       "column": "VALUE",
>       "datatype": null,
>       "derived": null
>     },
>     {
>       "id": 7,
>       "name": "RESOURCEFACT.AREAID",
>       "join": null,
>       "hierarchy": null,
>       "table": "RESOURCEFACT",
>       "column": "AREAID",
>       "datatype": null,
>       "derived": null
>     },
>     {
>       "id": 8,
>       "name": "RESOURCEFACT.CUSTOMERID",
>       "join": null,
>       "hierarchy": null,
>       "table": "RESOURCEFACT",
>       "column": "CUSTOMERID",
>       "datatype": null,
>       "derived": null
>     },
>     {
>       "id": 9,
>       "name": "USER_DERIVED",
>       "join": {
>         "type": "left",
>         "primary_key": [
>           "USERID"
>         ],
>         "foreign_key": [
>           "CUSTOMERID"
>         ]
>       },
>       "hierarchy": null,
>       "table": "USER",
>       "column": "{FK}",
>       "datatype": null,
>       "derived": [
>         "USERID",
>         "NAME"
>       ]
>     },
>     {
>       "id": 10,
>       "name": "KPI_DERIVED",
>       "join": {
>         "type": "left",
>         "primary_key": [
>           "INDICATORID"
>         ],
>         "foreign_key": [
>           "INDICATORID"
>         ]
>       },
>       "hierarchy": null,
>       "table": "KPI",
>       "column": "{FK}",
>       "datatype": null,
>       "derived": [
>         "INDICATORID",
>         "NAME"
>       ]
>     },
>     {
>       "id": 11,
>       "name": "AREA_DERIVED",
>       "join": {
>         "type": "left",
>         "primary_key": [
>           "AREAID"
>         ],
>         "foreign_key": [
>           "AREAID"
>         ]
>       },
>       "hierarchy": null,
>       "table": "AREA",
>       "column": "{FK}",
>       "datatype": null,
>       "derived": [
>         "AREAID"
>       ]
>     }
>   ],
>   "measures": [
>     {
>       "id": 1,
>       "name": "_COUNT_",
>       "function": {
>         "expression": "COUNT",
>         "parameter": {
>           "type": "constant",
>           "value": "1"
>         },
>         "returntype": "bigint"
>       },
>       "dependent_measure_ref": null
>     },
>     {
>       "id": 2,
>       "name": "SUMVALUE",
>       "function": {
>         "expression": "SUM",
>         "parameter": {
>           "type": "column",
>           "value": "VALUE"
>         },
>         "returntype": "bigint"
>       },
>       "dependent_measure_ref": null
>     }
>   ],
>   "rowkey": {
>     "rowkey_columns": [
>       {
>         "column": "NAME",
>         "length": 0,
>         "dictionary": "true",
>         "mandatory": false
>       },
>       {
>         "column": "COUNTRY",
>         "length": 0,
>         "dictionary": "true",
>         "mandatory": false
>       },
>       {
>         "column": "PROVINCE",
>         "length": 0,
>         "dictionary": "true",
>         "mandatory": false
>       },
>       {
>         "column": "CITY",
>         "length": 0,
>         "dictionary": "true",
>         "mandatory": false
>       },
>       {
>         "column": "RESOURCEID",
>         "length": 0,
>         "dictionary": "true",
>         "mandatory": false
>       },
>       {
>         "column": "INDICATORID",
>         "length": 0,
>         "dictionary": "true",
>         "mandatory": false
>       },
>       {
>         "column": "COLLECTIONTIME",
>         "length": 0,
>         "dictionary": "true",
>         "mandatory": false
>       },
>       {
>         "column": "PERIOD",
>         "length": 0,
>         "dictionary": "true",
>         "mandatory": false
>       },
>       {
>         "column": "VALUE",
>         "length": 0,
>         "dictionary": "true",
>         "mandatory": false
>       },
>       {
>         "column": "AREAID",
>         "length": 0,
>         "dictionary": "true",
>         "mandatory": false
>       },
>       {
>         "column": "CUSTOMERID",
>         "length": 0,
>         "dictionary": "true",
>         "mandatory": false
>       }
>     ],
>     "aggregation_groups": [
>       [
>         "RESOURCEID",
>         "INDICATORID",
>         "COLLECTIONTIME",
>         "PERIOD",
>         "VALUE",
>         "AREAID",
>         "CUSTOMERID"
>       ],
>       [
>         "NAME",
>         "COUNTRY",
>         "PROVINCE",
>         "CITY"
>       ]
>     ]
>   },
>   "signature": "1CSqztliEqN4l0cOPrkFdA==",
>   "capacity": "MEDIUM",
>   "last_modified": 1425587998627,
>   "fact_table": "RESOURCEFACT",
>   "null_string": null,
>   "filter_condition": null,
>   "cube_partition_desc": {
>     "partition_date_column": null,
>     "partition_date_start": 0,
>     "cube_partition_type": "APPEND"
>   },
>   "hbase_mapping": {
>     "column_family": [
>       {
>         "name": "F1",
>         "columns": [
>           {
>             "qualifier": "M",
>             "measure_refs": [
>               "_COUNT_",
>               "SUMVALUE"
>             ]
>           }
>         ]
>       }
>     ]
>   },
>   "notify_list": []
> }
>
> Regards,
> Santosh Akhilesh
> Bangalore R&D
> HUAWEI TECHNOLOGIES CO.,LTD.
>
> www.huawei.com
>
> -------------------------------------------------------------------------------------------------------------------------------------
> This e-mail and its attachments contain confidential information from
> HUAWEI, which
> is intended only for the person or entity whose address is listed above.
> Any use of the
> information contained herein in any way (including, but not limited to,
> total or partial
> disclosure, reproduction, or dissemination) by persons other than the
> intended
> recipient(s) is prohibited. If you receive this e-mail in error, please
> notify the sender by
> phone or email immediately and delete it!
>
> ________________________________________
> From: Santoshakhilesh
> Sent: Thursday, March 05, 2015 8:36 PM
> To: [email protected]
> Subject: RE: get wrong result for the query(both in kyin-0.71 and
> kylin-0.6x)
>
> hi dong ,
>   I am facing same issue when I run a count(*) on a query I get correct
> result
>   But when I do select * on same query I get only 1 result , but there are
> 1000 records as shown by client.
>
> I am not getting following notifications  , I am using 0.6.5
>
> *Note: Current results are partial, please click 'Show All' button to get
> all results.*
> Click the "Show All" button to disable the "AcceptPartialResults" mode, and
> you'll get a right result.
>
>
> Regards,
> Santosh Akhilesh
> Bangalore R&D
> HUAWEI TECHNOLOGIES CO.,LTD.
>
> www.huawei.com
>
> -------------------------------------------------------------------------------------------------------------------------------------
> This e-mail and its attachments contain confidential information from
> HUAWEI, which
> is intended only for the person or entity whose address is listed above.
> Any use of the
> information contained herein in any way (including, but not limited to,
> total or partial
> disclosure, reproduction, or dissemination) by persons other than the
> intended
> recipient(s) is prohibited. If you receive this e-mail in error, please
> notify the sender by
> phone or email immediately and delete it!
>
> ________________________________________
> From: hongbin ma [[email protected]]
> Sent: Thursday, March 05, 2015 11:55 AM
> To: [email protected]
> Subject: Re: get wrong result for the query(both in kyin-0.71 and
> kylin-0.6x)
>
> hi dong,
>
> we looked into it, here's the reason:
>
> By default if you're making queries on the web client, a mode called
> "AcceptPartialResults" is enabled​, this is a protection mechanism that
> will only return part of the results to reduce server overhead. Honestly it
> might hurt the correctness of order by queries.
>
> If you're seeking 100% correctness, after running the query you will find a
> notification:
> *Note: Current results are partial, please click 'Show All' button to get
> all results.*
> Click the "Show All" button to disable the "AcceptPartialResults" mode, and
> you'll get a right result.
>
> Notice "AcceptPartialResults" is only enabled by default at web client,
> you'll not meet such problems if you're using JDBC, ODBC or standard REST
> API.
>
> thanks
> hongbin
>
>
> On Thu, Mar 5, 2015 at 11:33 AM, dong wang <[email protected]> wrote:
>
> > Sorry hongbin, it's my fault, for the 2 question, the row count is LESS
> > THAN 50000
> >
> > 2015-03-05 11:28 GMT+08:00 dong wang <[email protected]>:
> >
> > > HI hongbin,
> > > 1, yes, as you guessed,  the result of the above query is CORRECT if we
> > > use ASC order, thus, the issue only occurs when DESC order.
> > > 2, the total row count of the result is GREATER THAN 50000 definitely,
> > > even much more
> > >
> > > 2015-03-05 11:06 GMT+08:00 hongbin ma <[email protected]>:
> > >
> > >> for #2 to #4, will Kylin return correct answers if order by ascending
> > >> order?
> > >> And what it the total result count of #4?
> > >>
> > >> On Thu, Mar 5, 2015 at 10:56 AM, Zhou, Qianhao <[email protected]>
> > wrote:
> > >>
> > >> > #1 when add condition for date type, you should add date before the
> > time
> > >> > string
> > >> >
> > >> > the sql should be like:
> > >> > select my_date, sum(f1), sum(f2), sum(f3) from test where
> > >> > my_date >= date'2013-01-01' and my_date <= date'2015-01-31'
> > >> >
> > >> >
> > >> >
> > >> > Best Regard
> > >> > Zhou QianHao
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > On 3/5/15, 10:40 AM, "dong wang" <[email protected]> wrote:
> > >> >
> > >> > >Hi all, I have experienced the wrong result for quite a long time
> > since
> > >> > >kylin-0.6x, thus, really hope that someone can do a help.
> > >> > >
> > >> > >CUBE:
> > >> > >1, the cube is based on just one big fact table. there are 10
> > >> attributes,
> > >> > >and 3 metrics.
> > >> > >for exmaple: my_date, col2, col3, col4, col5, col6, col7, col8,
> col9,
> > >> > >col10, sum(f1), sum(f2), sum(f2)
> > >> > >
> > >> > >2, my_date, col2 is normal columns, and my_date is the partition
> > >> > >attribute,
> > >> > >which is date type in HIVE,
> > >> > >
> > >> > >3, (col3, col4), (col5, col6), (col7, col8), (col9, col10) are 4
> > >> > >hirerarchies, and col3, col5, col7, col9 are parents of its
> hirarchy
> > >> > >respectively.
> > >> > >
> > >> > >4, the aggregation groups are: my_date, col2, col3, col4, col5,
> col6,
> > >> > >col7,
> > >> > >col8, col9, col10
> > >> > >
> > >> > >5, the rowkeys are: my_date, col2, col3, col4, col5, col6, col7,
> > col8,
> > >> > >col9, col10
> > >> > >
> > >> > >
> > >> > >NOTE THAT: in the cube, the range of my_date is between 2013-10-01
> > and
> > >> > >2015-03-04
> > >> > >
> > >> > >
> > >> > >2 kinds of issue as below:
> > >> > >
> > >> > >QUERY:
> > >> > >1, select my_date, sum(f1), sum(f2), sum(f3) from test where
> > >> > >my_date>='2013-01-01' and my_date<= '2015-01-31'
> > >> > >note "my_date" is HIVE date type. and the error: Cannot apply '>='
> to
> > >> > >arguments of type '<DATE> >= <CHAR(10)>'. Supported form(s):
> > >> > >'<COMPARABLE_TYPE> >= <COMPARABLE_TYPE>' while executing SQL, how
> > >> should I
> > >> > >modify the SQL?
> > >> > >
> > >> > >2, select my_date, col3, col4, sum(f1), sum(f2), sum(f3) from test
> > >> group
> > >> > >by
> > >> > >my_date, col3, col4, order by my_date desc, col3 desc, col4 desc;
> > >> > >
> > >> > >suppose that the actual row count of the result is GREATER THAN the
> > >> > >default
> > >> > >limit 50000;
> > >> > >
> > >> > >then, the result is:
> > >> > >
> > >> > >my_date, col3, col4, sum(f1), sum(f2), sum(f3)
> > >> > >2013-10-23
> > >> > >2013-10-23
> > >> > >...
> > >> > >2013-10-22
> > >> > >2013-10-22
> > >> > >...
> > >> > >2013-10-21
> > >> > >2013-10-21
> > >> > >...
> > >> > >...
> > >> > >...
> > >> > >2013-10-01
> > >> > >2013-10-01(note 2013-10-01 is the earliest day of the cube)
> > >> > >
> > >> > >totally, there are 50000 rows for the result,  but it is obvious
> that
> > >> the
> > >> > >order by and limit doesn't work correctly.
> > >> > >
> > >> > >3, select my_date, col3, col4, sum(f1), sum(f2), sum(f3) from test
> > >> group
> > >> > >by
> > >> > >my_date, col3, col4, order by my_date desc, col3 desc, col4 desc
> > LIMIT
> > >> 100
> > >> > >
> > >> > >just adding "LIMIT 100" to the tail of the 2nd query above, then,
> the
> > >> > >result is:
> > >> > >
> > >> > >my_date, col3, col4, sum(f1), sum(f2), sum(f3)
> > >> > >2013-10-05
> > >> > >2013-10-05
> > >> > >....
> > >> > >2013-10-05
> > >> > >
> > >> > >totally, there are 100 rows. all of which are "2013-10-05",
> > apparently,
> > >> > >the
> > >> > >result is not correct as well.
> > >> > >
> > >> > >4, select my_date, sum(f1), sum(f2), sum(f3) from test group by
> > my_date
> > >> > >order by my_date desc LIMIT 100
> > >> > >if I just add "my_date" as "aggreation level" as above, the result
> of
> > >> the
> > >> > >4th query is correct.
> > >> >
> > >> >
> > >>
> > >
> > >
> >
>

Reply via email to