Hi Hongbin, Ok, I will add group by and check. But can you please also check why my dim table user is not being recognised in query. And also I can't query if I add the column value which is in fact and I have added a measure over it. I had sent my cube request Json. Regards Santosh On Fri, 6 Mar 2015 at 7:50 am, hongbin ma <[email protected]> wrote:
> hi santoshakhilesh, > > In 0.6 versions, Kylin is only meant for aggregated queries, not detailed > level queries. Try modifying your query by adding some groupbys. > > I believe "select * from factable" will work for your case. > The reason why you can support "select * from facttable" is that when Kylin > detects such query pattern, it will secretly add all the dimension columns > and metric columns on fact table to the query and the query will be treated > like: > > select dim1,dim2,dim3.... dimN, metric1,metric2...metricN from facttable > groupby dim1, dim2, dim3,,,dimN > > So the result of "select * from factable" is aggregated if two rows in fact > table share the same dimension values. > > In 0.7 version which we introduced a new module called "inverted index", > with data organized in "inverted index" Kylin can better answer detailed > level queries (By MPP instead of cubing). However this is still under > development and not ready for community use. > > thanks > hongbin > > > On Thu, Mar 5, 2015 at 11:34 PM, Santoshakhilesh < > [email protected]> wrote: > > > Hi All , > > > > I am facing couple of issue regarding my queries in 0.6 > > > > 1. Query result in web ui is not correct , following return only one > > record , if I run count(*) on same join > > result is 1000 which is correct > > SELECT > > AREA.NAME > > ,AREA.COUNTRY > > ,AREA.PROVINCE > > ,AREA.CITY > > ,RESOURCEFACT.RESOURCEID > > ,RESOURCEFACT.INDICATORID > > ,RESOURCEFACT.COLLECTIONTIME > > ,RESOURCEFACT.PERIOD > > ,RESOURCEFACT.AREAID > > ,RESOURCEFACT.CUSTOMERID > > FROM RESOURCEFACT > > LEFT JOIN AREA > > ON RESOURCEFACT.AREAID = AREA.AREAID > > LEFT JOIN KPI > > ON RESOURCEFACT.INDICATORID = KPI.INDICATORID > > > > 2. The sample query on cube wizard is following; I have another table > User > > when I run this query after cube build Kylin throws exception that it > cant > > find user table. even select count(*) on User table fails. But the table > is > > there in hive. RESOURCEFACT.VALUE this column cant be recognized by sql > > query on web ui. > > > > SELECT > > AREA.NAME > > ,AREA.COUNTRY > > ,AREA.PROVINCE > > ,AREA.CITY > > ,RESOURCEFACT.RESOURCEID > > ,RESOURCEFACT.INDICATORID > > ,RESOURCEFACT.COLLECTIONTIME > > ,RESOURCEFACT.PERIOD > > ,RESOURCEFACT.VALUE > > ,RESOURCEFACT.AREAID > > ,RESOURCEFACT.CUSTOMERID > > FROM RESOURCEFACT > > LEFT JOIN AREA > > ON RESOURCEFACT.AREAID = AREA.AREAID > > LEFT JOIN USER > > ON RESOURCEFACT.CUSTOMERID = USER.USERID > > LEFT JOIN KPI > > ON RESOURCEFACT.INDICATORID = KPI.INDICATORID > > > > MY Cube Json is as below. > > { > > "uuid": "95204f18-24f7-43e5-bead-368900f74f9c", > > "name": "secondcube", > > "description": "", > > "dimensions": [ > > { > > "id": 1, > > "name": "RESOURCEGROUP", > > "join": { > > "type": "left", > > "primary_key": [ > > "AREAID" > > ], > > "foreign_key": [ > > "AREAID" > > ] > > }, > > "hierarchy": [ > > { > > "level": "1", > > "column": "NAME" > > }, > > { > > "level": "2", > > "column": "COUNTRY" > > }, > > { > > "level": "3", > > "column": "PROVINCE" > > }, > > { > > "level": "4", > > "column": "CITY" > > } > > ], > > "table": "AREA", > > "column": null, > > "datatype": null, > > "derived": null > > }, > > { > > "id": 2, > > "name": "RESOURCEFACT.RESOURCEID", > > "join": null, > > "hierarchy": null, > > "table": "RESOURCEFACT", > > "column": "RESOURCEID", > > "datatype": null, > > "derived": null > > }, > > { > > "id": 3, > > "name": "RESOURCEFACT.INDICATORID", > > "join": null, > > "hierarchy": null, > > "table": "RESOURCEFACT", > > "column": "INDICATORID", > > "datatype": null, > > "derived": null > > }, > > { > > "id": 4, > > "name": "RESOURCEFACT.COLLECTIONTIME", > > "join": null, > > "hierarchy": null, > > "table": "RESOURCEFACT", > > "column": "COLLECTIONTIME", > > "datatype": null, > > "derived": null > > }, > > { > > "id": 5, > > "name": "RESOURCEFACT.PERIOD", > > "join": null, > > "hierarchy": null, > > "table": "RESOURCEFACT", > > "column": "PERIOD", > > "datatype": null, > > "derived": null > > }, > > { > > "id": 6, > > "name": "RESOURCEFACT.VALUE", > > "join": null, > > "hierarchy": null, > > "table": "RESOURCEFACT", > > "column": "VALUE", > > "datatype": null, > > "derived": null > > }, > > { > > "id": 7, > > "name": "RESOURCEFACT.AREAID", > > "join": null, > > "hierarchy": null, > > "table": "RESOURCEFACT", > > "column": "AREAID", > > "datatype": null, > > "derived": null > > }, > > { > > "id": 8, > > "name": "RESOURCEFACT.CUSTOMERID", > > "join": null, > > "hierarchy": null, > > "table": "RESOURCEFACT", > > "column": "CUSTOMERID", > > "datatype": null, > > "derived": null > > }, > > { > > "id": 9, > > "name": "USER_DERIVED", > > "join": { > > "type": "left", > > "primary_key": [ > > "USERID" > > ], > > "foreign_key": [ > > "CUSTOMERID" > > ] > > }, > > "hierarchy": null, > > "table": "USER", > > "column": "{FK}", > > "datatype": null, > > "derived": [ > > "USERID", > > "NAME" > > ] > > }, > > { > > "id": 10, > > "name": "KPI_DERIVED", > > "join": { > > "type": "left", > > "primary_key": [ > > "INDICATORID" > > ], > > "foreign_key": [ > > "INDICATORID" > > ] > > }, > > "hierarchy": null, > > "table": "KPI", > > "column": "{FK}", > > "datatype": null, > > "derived": [ > > "INDICATORID", > > "NAME" > > ] > > }, > > { > > "id": 11, > > "name": "AREA_DERIVED", > > "join": { > > "type": "left", > > "primary_key": [ > > "AREAID" > > ], > > "foreign_key": [ > > "AREAID" > > ] > > }, > > "hierarchy": null, > > "table": "AREA", > > "column": "{FK}", > > "datatype": null, > > "derived": [ > > "AREAID" > > ] > > } > > ], > > "measures": [ > > { > > "id": 1, > > "name": "_COUNT_", > > "function": { > > "expression": "COUNT", > > "parameter": { > > "type": "constant", > > "value": "1" > > }, > > "returntype": "bigint" > > }, > > "dependent_measure_ref": null > > }, > > { > > "id": 2, > > "name": "SUMVALUE", > > "function": { > > "expression": "SUM", > > "parameter": { > > "type": "column", > > "value": "VALUE" > > }, > > "returntype": "bigint" > > }, > > "dependent_measure_ref": null > > } > > ], > > "rowkey": { > > "rowkey_columns": [ > > { > > "column": "NAME", > > "length": 0, > > "dictionary": "true", > > "mandatory": false > > }, > > { > > "column": "COUNTRY", > > "length": 0, > > "dictionary": "true", > > "mandatory": false > > }, > > { > > "column": "PROVINCE", > > "length": 0, > > "dictionary": "true", > > "mandatory": false > > }, > > { > > "column": "CITY", > > "length": 0, > > "dictionary": "true", > > "mandatory": false > > }, > > { > > "column": "RESOURCEID", > > "length": 0, > > "dictionary": "true", > > "mandatory": false > > }, > > { > > "column": "INDICATORID", > > "length": 0, > > "dictionary": "true", > > "mandatory": false > > }, > > { > > "column": "COLLECTIONTIME", > > "length": 0, > > "dictionary": "true", > > "mandatory": false > > }, > > { > > "column": "PERIOD", > > "length": 0, > > "dictionary": "true", > > "mandatory": false > > }, > > { > > "column": "VALUE", > > "length": 0, > > "dictionary": "true", > > "mandatory": false > > }, > > { > > "column": "AREAID", > > "length": 0, > > "dictionary": "true", > > "mandatory": false > > }, > > { > > "column": "CUSTOMERID", > > "length": 0, > > "dictionary": "true", > > "mandatory": false > > } > > ], > > "aggregation_groups": [ > > [ > > "RESOURCEID", > > "INDICATORID", > > "COLLECTIONTIME", > > "PERIOD", > > "VALUE", > > "AREAID", > > "CUSTOMERID" > > ], > > [ > > "NAME", > > "COUNTRY", > > "PROVINCE", > > "CITY" > > ] > > ] > > }, > > "signature": "1CSqztliEqN4l0cOPrkFdA==", > > "capacity": "MEDIUM", > > "last_modified": 1425587998627, > > "fact_table": "RESOURCEFACT", > > "null_string": null, > > "filter_condition": null, > > "cube_partition_desc": { > > "partition_date_column": null, > > "partition_date_start": 0, > > "cube_partition_type": "APPEND" > > }, > > "hbase_mapping": { > > "column_family": [ > > { > > "name": "F1", > > "columns": [ > > { > > "qualifier": "M", > > "measure_refs": [ > > "_COUNT_", > > "SUMVALUE" > > ] > > } > > ] > > } > > ] > > }, > > "notify_list": [] > > } > > > > Regards, > > Santosh Akhilesh > > Bangalore R&D > > HUAWEI TECHNOLOGIES CO.,LTD. > > > > www.huawei.com > > > > ------------------------------------------------------------ > ------------------------------------------------------------------------- > > This e-mail and its attachments contain confidential information from > > HUAWEI, which > > is intended only for the person or entity whose address is listed above. > > Any use of the > > information contained herein in any way (including, but not limited to, > > total or partial > > disclosure, reproduction, or dissemination) by persons other than the > > intended > > recipient(s) is prohibited. If you receive this e-mail in error, please > > notify the sender by > > phone or email immediately and delete it! > > > > ________________________________________ > > From: Santoshakhilesh > > Sent: Thursday, March 05, 2015 8:36 PM > > To: [email protected] > > Subject: RE: get wrong result for the query(both in kyin-0.71 and > > kylin-0.6x) > > > > hi dong , > > I am facing same issue when I run a count(*) on a query I get correct > > result > > But when I do select * on same query I get only 1 result , but there > are > > 1000 records as shown by client. > > > > I am not getting following notifications , I am using 0.6.5 > > > > *Note: Current results are partial, please click 'Show All' button to get > > all results.* > > Click the "Show All" button to disable the "AcceptPartialResults" mode, > and > > you'll get a right result. > > > > > > Regards, > > Santosh Akhilesh > > Bangalore R&D > > HUAWEI TECHNOLOGIES CO.,LTD. > > > > www.huawei.com > > > > ------------------------------------------------------------ > ------------------------------------------------------------------------- > > This e-mail and its attachments contain confidential information from > > HUAWEI, which > > is intended only for the person or entity whose address is listed above. > > Any use of the > > information contained herein in any way (including, but not limited to, > > total or partial > > disclosure, reproduction, or dissemination) by persons other than the > > intended > > recipient(s) is prohibited. If you receive this e-mail in error, please > > notify the sender by > > phone or email immediately and delete it! > > > > ________________________________________ > > From: hongbin ma [[email protected]] > > Sent: Thursday, March 05, 2015 11:55 AM > > To: [email protected] > > Subject: Re: get wrong result for the query(both in kyin-0.71 and > > kylin-0.6x) > > > > hi dong, > > > > we looked into it, here's the reason: > > > > By default if you're making queries on the web client, a mode called > > "AcceptPartialResults" is enabled​, this is a protection mechanism that > > will only return part of the results to reduce server overhead. Honestly > it > > might hurt the correctness of order by queries. > > > > If you're seeking 100% correctness, after running the query you will > find a > > notification: > > *Note: Current results are partial, please click 'Show All' button to get > > all results.* > > Click the "Show All" button to disable the "AcceptPartialResults" mode, > and > > you'll get a right result. > > > > Notice "AcceptPartialResults" is only enabled by default at web client, > > you'll not meet such problems if you're using JDBC, ODBC or standard REST > > API. > > > > thanks > > hongbin > > > > > > On Thu, Mar 5, 2015 at 11:33 AM, dong wang <[email protected]> > wrote: > > > > > Sorry hongbin, it's my fault, for the 2 question, the row count is LESS > > > THAN 50000 > > > > > > 2015-03-05 11:28 GMT+08:00 dong wang <[email protected]>: > > > > > > > HI hongbin, > > > > 1, yes, as you guessed, the result of the above query is CORRECT if > we > > > > use ASC order, thus, the issue only occurs when DESC order. > > > > 2, the total row count of the result is GREATER THAN 50000 > definitely, > > > > even much more > > > > > > > > 2015-03-05 11:06 GMT+08:00 hongbin ma <[email protected]>: > > > > > > > >> for #2 to #4, will Kylin return correct answers if order by > ascending > > > >> order? > > > >> And what it the total result count of #4? > > > >> > > > >> On Thu, Mar 5, 2015 at 10:56 AM, Zhou, Qianhao <[email protected]> > > > wrote: > > > >> > > > >> > #1 when add condition for date type, you should add date before > the > > > time > > > >> > string > > > >> > > > > >> > the sql should be like: > > > >> > select my_date, sum(f1), sum(f2), sum(f3) from test where > > > >> > my_date >= date'2013-01-01' and my_date <= date'2015-01-31' > > > >> > > > > >> > > > > >> > > > > >> > Best Regard > > > >> > Zhou QianHao > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > On 3/5/15, 10:40 AM, "dong wang" <[email protected]> wrote: > > > >> > > > > >> > >Hi all, I have experienced the wrong result for quite a long time > > > since > > > >> > >kylin-0.6x, thus, really hope that someone can do a help. > > > >> > > > > > >> > >CUBE: > > > >> > >1, the cube is based on just one big fact table. there are 10 > > > >> attributes, > > > >> > >and 3 metrics. > > > >> > >for exmaple: my_date, col2, col3, col4, col5, col6, col7, col8, > > col9, > > > >> > >col10, sum(f1), sum(f2), sum(f2) > > > >> > > > > > >> > >2, my_date, col2 is normal columns, and my_date is the partition > > > >> > >attribute, > > > >> > >which is date type in HIVE, > > > >> > > > > > >> > >3, (col3, col4), (col5, col6), (col7, col8), (col9, col10) are 4 > > > >> > >hirerarchies, and col3, col5, col7, col9 are parents of its > > hirarchy > > > >> > >respectively. > > > >> > > > > > >> > >4, the aggregation groups are: my_date, col2, col3, col4, col5, > > col6, > > > >> > >col7, > > > >> > >col8, col9, col10 > > > >> > > > > > >> > >5, the rowkeys are: my_date, col2, col3, col4, col5, col6, col7, > > > col8, > > > >> > >col9, col10 > > > >> > > > > > >> > > > > > >> > >NOTE THAT: in the cube, the range of my_date is between > 2013-10-01 > > > and > > > >> > >2015-03-04 > > > >> > > > > > >> > > > > > >> > >2 kinds of issue as below: > > > >> > > > > > >> > >QUERY: > > > >> > >1, select my_date, sum(f1), sum(f2), sum(f3) from test where > > > >> > >my_date>='2013-01-01' and my_date<= '2015-01-31' > > > >> > >note "my_date" is HIVE date type. and the error: Cannot apply > '>=' > > to > > > >> > >arguments of type '<DATE> >= <CHAR(10)>'. Supported form(s): > > > >> > >'<COMPARABLE_TYPE> >= <COMPARABLE_TYPE>' while executing SQL, how > > > >> should I > > > >> > >modify the SQL? > > > >> > > > > > >> > >2, select my_date, col3, col4, sum(f1), sum(f2), sum(f3) from > test > > > >> group > > > >> > >by > > > >> > >my_date, col3, col4, order by my_date desc, col3 desc, col4 desc; > > > >> > > > > > >> > >suppose that the actual row count of the result is GREATER THAN > the > > > >> > >default > > > >> > >limit 50000; > > > >> > > > > > >> > >then, the result is: > > > >> > > > > > >> > >my_date, col3, col4, sum(f1), sum(f2), sum(f3) > > > >> > >2013-10-23 > > > >> > >2013-10-23 > > > >> > >... > > > >> > >2013-10-22 > > > >> > >2013-10-22 > > > >> > >... > > > >> > >2013-10-21 > > > >> > >2013-10-21 > > > >> > >... > > > >> > >... > > > >> > >... > > > >> > >2013-10-01 > > > >> > >2013-10-01(note 2013-10-01 is the earliest day of the cube) > > > >> > > > > > >> > >totally, there are 50000 rows for the result, but it is obvious > > that > > > >> the > > > >> > >order by and limit doesn't work correctly. > > > >> > > > > > >> > >3, select my_date, col3, col4, sum(f1), sum(f2), sum(f3) from > test > > > >> group > > > >> > >by > > > >> > >my_date, col3, col4, order by my_date desc, col3 desc, col4 desc > > > LIMIT > > > >> 100 > > > >> > > > > > >> > >just adding "LIMIT 100" to the tail of the 2nd query above, then, > > the > > > >> > >result is: > > > >> > > > > > >> > >my_date, col3, col4, sum(f1), sum(f2), sum(f3) > > > >> > >2013-10-05 > > > >> > >2013-10-05 > > > >> > >.... > > > >> > >2013-10-05 > > > >> > > > > > >> > >totally, there are 100 rows. all of which are "2013-10-05", > > > apparently, > > > >> > >the > > > >> > >result is not correct as well. > > > >> > > > > > >> > >4, select my_date, sum(f1), sum(f2), sum(f3) from test group by > > > my_date > > > >> > >order by my_date desc LIMIT 100 > > > >> > >if I just add "my_date" as "aggreation level" as above, the > result > > of > > > >> the > > > >> > >4th query is correct. > > > >> > > > > >> > > > > >> > > > > > > > > > > > > > >
