hi santosh for now please make sure all your queries have groupby clause
On Fri, Mar 6, 2015 at 10:48 AM, Santosh Akhilesh <[email protected] > wrote: > Hi Hongbin, > If you see my queries I am doing left join dim tables with fact and if I do > a count on it I get correct result. But if I try to select each column in > join to get record out I get only one result. Actually my query is not on > only select * from fact.. > So do you mean to say such queries are not supported in kylin 0.6. > Regards > Santosh Akhilesh > On Fri, 6 Mar 2015 at 8:07 am, Santosh Akhilesh <[email protected] > > > wrote: > > > Hi Hongbin, > > Ok, I will add group by and check. But can you please also check why my > > dim table user is not being recognised in query. And also I can't query > if > > I add the column value which is in fact and I have added a measure over > it. > > I had sent my cube request Json. > > Regards > > Santosh > > On Fri, 6 Mar 2015 at 7:50 am, hongbin ma <[email protected]> wrote: > > > >> hi santoshakhilesh, > >> > >> In 0.6 versions, Kylin is only meant for aggregated queries, not > detailed > >> level queries. Try modifying your query by adding some groupbys. > >> > >> I believe "select * from factable" will work for your case. > >> The reason why you can support "select * from facttable" is that when > >> Kylin > >> detects such query pattern, it will secretly add all the dimension > columns > >> and metric columns on fact table to the query and the query will be > >> treated > >> like: > >> > >> select dim1,dim2,dim3.... dimN, metric1,metric2...metricN from facttable > >> groupby dim1, dim2, dim3,,,dimN > >> > >> So the result of "select * from factable" is aggregated if two rows in > >> fact > >> table share the same dimension values. > >> > >> In 0.7 version which we introduced a new module called "inverted index", > >> with data organized in "inverted index" Kylin can better answer detailed > >> level queries (By MPP instead of cubing). However this is still under > >> development and not ready for community use. > >> > >> thanks > >> hongbin > >> > >> > >> On Thu, Mar 5, 2015 at 11:34 PM, Santoshakhilesh < > >> [email protected]> wrote: > >> > >> > Hi All , > >> > > >> > I am facing couple of issue regarding my queries in 0.6 > >> > > >> > 1. Query result in web ui is not correct , following return only one > >> > record , if I run count(*) on same join > >> > result is 1000 which is correct > >> > SELECT > >> > AREA.NAME > >> > ,AREA.COUNTRY > >> > ,AREA.PROVINCE > >> > ,AREA.CITY > >> > ,RESOURCEFACT.RESOURCEID > >> > ,RESOURCEFACT.INDICATORID > >> > ,RESOURCEFACT.COLLECTIONTIME > >> > ,RESOURCEFACT.PERIOD > >> > ,RESOURCEFACT.AREAID > >> > ,RESOURCEFACT.CUSTOMERID > >> > FROM RESOURCEFACT > >> > LEFT JOIN AREA > >> > ON RESOURCEFACT.AREAID = AREA.AREAID > >> > LEFT JOIN KPI > >> > ON RESOURCEFACT.INDICATORID = KPI.INDICATORID > >> > > >> > 2. The sample query on cube wizard is following; I have another table > >> User > >> > when I run this query after cube build Kylin throws exception that it > >> cant > >> > find user table. even select count(*) on User table fails. But the > >> table is > >> > there in hive. RESOURCEFACT.VALUE this column cant be recognized by > sql > >> > query on web ui. > >> > > >> > SELECT > >> > AREA.NAME > >> > ,AREA.COUNTRY > >> > ,AREA.PROVINCE > >> > ,AREA.CITY > >> > ,RESOURCEFACT.RESOURCEID > >> > ,RESOURCEFACT.INDICATORID > >> > ,RESOURCEFACT.COLLECTIONTIME > >> > ,RESOURCEFACT.PERIOD > >> > ,RESOURCEFACT.VALUE > >> > ,RESOURCEFACT.AREAID > >> > ,RESOURCEFACT.CUSTOMERID > >> > FROM RESOURCEFACT > >> > LEFT JOIN AREA > >> > ON RESOURCEFACT.AREAID = AREA.AREAID > >> > LEFT JOIN USER > >> > ON RESOURCEFACT.CUSTOMERID = USER.USERID > >> > LEFT JOIN KPI > >> > ON RESOURCEFACT.INDICATORID = KPI.INDICATORID > >> > > >> > MY Cube Json is as below. > >> > { > >> > "uuid": "95204f18-24f7-43e5-bead-368900f74f9c", > >> > "name": "secondcube", > >> > "description": "", > >> > "dimensions": [ > >> > { > >> > "id": 1, > >> > "name": "RESOURCEGROUP", > >> > "join": { > >> > "type": "left", > >> > "primary_key": [ > >> > "AREAID" > >> > ], > >> > "foreign_key": [ > >> > "AREAID" > >> > ] > >> > }, > >> > "hierarchy": [ > >> > { > >> > "level": "1", > >> > "column": "NAME" > >> > }, > >> > { > >> > "level": "2", > >> > "column": "COUNTRY" > >> > }, > >> > { > >> > "level": "3", > >> > "column": "PROVINCE" > >> > }, > >> > { > >> > "level": "4", > >> > "column": "CITY" > >> > } > >> > ], > >> > "table": "AREA", > >> > "column": null, > >> > "datatype": null, > >> > "derived": null > >> > }, > >> > { > >> > "id": 2, > >> > "name": "RESOURCEFACT.RESOURCEID", > >> > "join": null, > >> > "hierarchy": null, > >> > "table": "RESOURCEFACT", > >> > "column": "RESOURCEID", > >> > "datatype": null, > >> > "derived": null > >> > }, > >> > { > >> > "id": 3, > >> > "name": "RESOURCEFACT.INDICATORID", > >> > "join": null, > >> > "hierarchy": null, > >> > "table": "RESOURCEFACT", > >> > "column": "INDICATORID", > >> > "datatype": null, > >> > "derived": null > >> > }, > >> > { > >> > "id": 4, > >> > "name": "RESOURCEFACT.COLLECTIONTIME", > >> > "join": null, > >> > "hierarchy": null, > >> > "table": "RESOURCEFACT", > >> > "column": "COLLECTIONTIME", > >> > "datatype": null, > >> > "derived": null > >> > }, > >> > { > >> > "id": 5, > >> > "name": "RESOURCEFACT.PERIOD", > >> > "join": null, > >> > "hierarchy": null, > >> > "table": "RESOURCEFACT", > >> > "column": "PERIOD", > >> > "datatype": null, > >> > "derived": null > >> > }, > >> > { > >> > "id": 6, > >> > "name": "RESOURCEFACT.VALUE", > >> > "join": null, > >> > "hierarchy": null, > >> > "table": "RESOURCEFACT", > >> > "column": "VALUE", > >> > "datatype": null, > >> > "derived": null > >> > }, > >> > { > >> > "id": 7, > >> > "name": "RESOURCEFACT.AREAID", > >> > "join": null, > >> > "hierarchy": null, > >> > "table": "RESOURCEFACT", > >> > "column": "AREAID", > >> > "datatype": null, > >> > "derived": null > >> > }, > >> > { > >> > "id": 8, > >> > "name": "RESOURCEFACT.CUSTOMERID", > >> > "join": null, > >> > "hierarchy": null, > >> > "table": "RESOURCEFACT", > >> > "column": "CUSTOMERID", > >> > "datatype": null, > >> > "derived": null > >> > }, > >> > { > >> > "id": 9, > >> > "name": "USER_DERIVED", > >> > "join": { > >> > "type": "left", > >> > "primary_key": [ > >> > "USERID" > >> > ], > >> > "foreign_key": [ > >> > "CUSTOMERID" > >> > ] > >> > }, > >> > "hierarchy": null, > >> > "table": "USER", > >> > "column": "{FK}", > >> > "datatype": null, > >> > "derived": [ > >> > "USERID", > >> > "NAME" > >> > ] > >> > }, > >> > { > >> > "id": 10, > >> > "name": "KPI_DERIVED", > >> > "join": { > >> > "type": "left", > >> > "primary_key": [ > >> > "INDICATORID" > >> > ], > >> > "foreign_key": [ > >> > "INDICATORID" > >> > ] > >> > }, > >> > "hierarchy": null, > >> > "table": "KPI", > >> > "column": "{FK}", > >> > "datatype": null, > >> > "derived": [ > >> > "INDICATORID", > >> > "NAME" > >> > ] > >> > }, > >> > { > >> > "id": 11, > >> > "name": "AREA_DERIVED", > >> > "join": { > >> > "type": "left", > >> > "primary_key": [ > >> > "AREAID" > >> > ], > >> > "foreign_key": [ > >> > "AREAID" > >> > ] > >> > }, > >> > "hierarchy": null, > >> > "table": "AREA", > >> > "column": "{FK}", > >> > "datatype": null, > >> > "derived": [ > >> > "AREAID" > >> > ] > >> > } > >> > ], > >> > "measures": [ > >> > { > >> > "id": 1, > >> > "name": "_COUNT_", > >> > "function": { > >> > "expression": "COUNT", > >> > "parameter": { > >> > "type": "constant", > >> > "value": "1" > >> > }, > >> > "returntype": "bigint" > >> > }, > >> > "dependent_measure_ref": null > >> > }, > >> > { > >> > "id": 2, > >> > "name": "SUMVALUE", > >> > "function": { > >> > "expression": "SUM", > >> > "parameter": { > >> > "type": "column", > >> > "value": "VALUE" > >> > }, > >> > "returntype": "bigint" > >> > }, > >> > "dependent_measure_ref": null > >> > } > >> > ], > >> > "rowkey": { > >> > "rowkey_columns": [ > >> > { > >> > "column": "NAME", > >> > "length": 0, > >> > "dictionary": "true", > >> > "mandatory": false > >> > }, > >> > { > >> > "column": "COUNTRY", > >> > "length": 0, > >> > "dictionary": "true", > >> > "mandatory": false > >> > }, > >> > { > >> > "column": "PROVINCE", > >> > "length": 0, > >> > "dictionary": "true", > >> > "mandatory": false > >> > }, > >> > { > >> > "column": "CITY", > >> > "length": 0, > >> > "dictionary": "true", > >> > "mandatory": false > >> > }, > >> > { > >> > "column": "RESOURCEID", > >> > "length": 0, > >> > "dictionary": "true", > >> > "mandatory": false > >> > }, > >> > { > >> > "column": "INDICATORID", > >> > "length": 0, > >> > "dictionary": "true", > >> > "mandatory": false > >> > }, > >> > { > >> > "column": "COLLECTIONTIME", > >> > "length": 0, > >> > "dictionary": "true", > >> > "mandatory": false > >> > }, > >> > { > >> > "column": "PERIOD", > >> > "length": 0, > >> > "dictionary": "true", > >> > "mandatory": false > >> > }, > >> > { > >> > "column": "VALUE", > >> > "length": 0, > >> > "dictionary": "true", > >> > "mandatory": false > >> > }, > >> > { > >> > "column": "AREAID", > >> > "length": 0, > >> > "dictionary": "true", > >> > "mandatory": false > >> > }, > >> > { > >> > "column": "CUSTOMERID", > >> > "length": 0, > >> > "dictionary": "true", > >> > "mandatory": false > >> > } > >> > ], > >> > "aggregation_groups": [ > >> > [ > >> > "RESOURCEID", > >> > "INDICATORID", > >> > "COLLECTIONTIME", > >> > "PERIOD", > >> > "VALUE", > >> > "AREAID", > >> > "CUSTOMERID" > >> > ], > >> > [ > >> > "NAME", > >> > "COUNTRY", > >> > "PROVINCE", > >> > "CITY" > >> > ] > >> > ] > >> > }, > >> > "signature": "1CSqztliEqN4l0cOPrkFdA==", > >> > "capacity": "MEDIUM", > >> > "last_modified": 1425587998627, > >> > "fact_table": "RESOURCEFACT", > >> > "null_string": null, > >> > "filter_condition": null, > >> > "cube_partition_desc": { > >> > "partition_date_column": null, > >> > "partition_date_start": 0, > >> > "cube_partition_type": "APPEND" > >> > }, > >> > "hbase_mapping": { > >> > "column_family": [ > >> > { > >> > "name": "F1", > >> > "columns": [ > >> > { > >> > "qualifier": "M", > >> > "measure_refs": [ > >> > "_COUNT_", > >> > "SUMVALUE" > >> > ] > >> > } > >> > ] > >> > } > >> > ] > >> > }, > >> > "notify_list": [] > >> > } > >> > > >> > Regards, > >> > Santosh Akhilesh > >> > Bangalore R&D > >> > HUAWEI TECHNOLOGIES CO.,LTD. > >> > > >> > www.huawei.com > >> > > >> > ------------------------------------------------------------ > >> > ------------------------------------------------------------------------- > >> > This e-mail and its attachments contain confidential information from > >> > HUAWEI, which > >> > is intended only for the person or entity whose address is listed > above. > >> > Any use of the > >> > information contained herein in any way (including, but not limited > to, > >> > total or partial > >> > disclosure, reproduction, or dissemination) by persons other than the > >> > intended > >> > recipient(s) is prohibited. If you receive this e-mail in error, > please > >> > notify the sender by > >> > phone or email immediately and delete it! > >> > > >> > ________________________________________ > >> > From: Santoshakhilesh > >> > Sent: Thursday, March 05, 2015 8:36 PM > >> > To: [email protected] > >> > Subject: RE: get wrong result for the query(both in kyin-0.71 and > >> > kylin-0.6x) > >> > > >> > hi dong , > >> > I am facing same issue when I run a count(*) on a query I get > correct > >> > result > >> > But when I do select * on same query I get only 1 result , but there > >> are > >> > 1000 records as shown by client. > >> > > >> > I am not getting following notifications , I am using 0.6.5 > >> > > >> > *Note: Current results are partial, please click 'Show All' button to > >> get > >> > all results.* > >> > Click the "Show All" button to disable the "AcceptPartialResults" > mode, > >> and > >> > you'll get a right result. > >> > > >> > > >> > Regards, > >> > Santosh Akhilesh > >> > Bangalore R&D > >> > HUAWEI TECHNOLOGIES CO.,LTD. > >> > > >> > www.huawei.com > >> > > >> > ------------------------------------------------------------ > >> > ------------------------------------------------------------------------- > >> > This e-mail and its attachments contain confidential information from > >> > HUAWEI, which > >> > is intended only for the person or entity whose address is listed > above. > >> > Any use of the > >> > information contained herein in any way (including, but not limited > to, > >> > total or partial > >> > disclosure, reproduction, or dissemination) by persons other than the > >> > intended > >> > recipient(s) is prohibited. If you receive this e-mail in error, > please > >> > notify the sender by > >> > phone or email immediately and delete it! > >> > > >> > ________________________________________ > >> > From: hongbin ma [[email protected]] > >> > Sent: Thursday, March 05, 2015 11:55 AM > >> > To: [email protected] > >> > Subject: Re: get wrong result for the query(both in kyin-0.71 and > >> > kylin-0.6x) > >> > > >> > hi dong, > >> > > >> > we looked into it, here's the reason: > >> > > >> > By default if you're making queries on the web client, a mode called > >> > "AcceptPartialResults" is enabled​, this is a protection mechanism > that > >> > will only return part of the results to reduce server overhead. > >> Honestly it > >> > might hurt the correctness of order by queries. > >> > > >> > If you're seeking 100% correctness, after running the query you will > >> find a > >> > notification: > >> > *Note: Current results are partial, please click 'Show All' button to > >> get > >> > all results.* > >> > Click the "Show All" button to disable the "AcceptPartialResults" > mode, > >> and > >> > you'll get a right result. > >> > > >> > Notice "AcceptPartialResults" is only enabled by default at web > client, > >> > you'll not meet such problems if you're using JDBC, ODBC or standard > >> REST > >> > API. > >> > > >> > thanks > >> > hongbin > >> > > >> > > >> > On Thu, Mar 5, 2015 at 11:33 AM, dong wang <[email protected]> > >> wrote: > >> > > >> > > Sorry hongbin, it's my fault, for the 2 question, the row count is > >> LESS > >> > > THAN 50000 > >> > > > >> > > 2015-03-05 11:28 GMT+08:00 dong wang <[email protected]>: > >> > > > >> > > > HI hongbin, > >> > > > 1, yes, as you guessed, the result of the above query is CORRECT > >> if we > >> > > > use ASC order, thus, the issue only occurs when DESC order. > >> > > > 2, the total row count of the result is GREATER THAN 50000 > >> definitely, > >> > > > even much more > >> > > > > >> > > > 2015-03-05 11:06 GMT+08:00 hongbin ma <[email protected]>: > >> > > > > >> > > >> for #2 to #4, will Kylin return correct answers if order by > >> ascending > >> > > >> order? > >> > > >> And what it the total result count of #4? > >> > > >> > >> > > >> On Thu, Mar 5, 2015 at 10:56 AM, Zhou, Qianhao < > [email protected]> > >> > > wrote: > >> > > >> > >> > > >> > #1 when add condition for date type, you should add date before > >> the > >> > > time > >> > > >> > string > >> > > >> > > >> > > >> > the sql should be like: > >> > > >> > select my_date, sum(f1), sum(f2), sum(f3) from test where > >> > > >> > my_date >= date'2013-01-01' and my_date <= date'2015-01-31' > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > Best Regard > >> > > >> > Zhou QianHao > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > On 3/5/15, 10:40 AM, "dong wang" <[email protected]> > wrote: > >> > > >> > > >> > > >> > >Hi all, I have experienced the wrong result for quite a long > >> time > >> > > since > >> > > >> > >kylin-0.6x, thus, really hope that someone can do a help. > >> > > >> > > > >> > > >> > >CUBE: > >> > > >> > >1, the cube is based on just one big fact table. there are 10 > >> > > >> attributes, > >> > > >> > >and 3 metrics. > >> > > >> > >for exmaple: my_date, col2, col3, col4, col5, col6, col7, > col8, > >> > col9, > >> > > >> > >col10, sum(f1), sum(f2), sum(f2) > >> > > >> > > > >> > > >> > >2, my_date, col2 is normal columns, and my_date is the > partition > >> > > >> > >attribute, > >> > > >> > >which is date type in HIVE, > >> > > >> > > > >> > > >> > >3, (col3, col4), (col5, col6), (col7, col8), (col9, col10) > are 4 > >> > > >> > >hirerarchies, and col3, col5, col7, col9 are parents of its > >> > hirarchy > >> > > >> > >respectively. > >> > > >> > > > >> > > >> > >4, the aggregation groups are: my_date, col2, col3, col4, > col5, > >> > col6, > >> > > >> > >col7, > >> > > >> > >col8, col9, col10 > >> > > >> > > > >> > > >> > >5, the rowkeys are: my_date, col2, col3, col4, col5, col6, > col7, > >> > > col8, > >> > > >> > >col9, col10 > >> > > >> > > > >> > > >> > > > >> > > >> > >NOTE THAT: in the cube, the range of my_date is between > >> 2013-10-01 > >> > > and > >> > > >> > >2015-03-04 > >> > > >> > > > >> > > >> > > > >> > > >> > >2 kinds of issue as below: > >> > > >> > > > >> > > >> > >QUERY: > >> > > >> > >1, select my_date, sum(f1), sum(f2), sum(f3) from test where > >> > > >> > >my_date>='2013-01-01' and my_date<= '2015-01-31' > >> > > >> > >note "my_date" is HIVE date type. and the error: Cannot apply > >> '>=' > >> > to > >> > > >> > >arguments of type '<DATE> >= <CHAR(10)>'. Supported form(s): > >> > > >> > >'<COMPARABLE_TYPE> >= <COMPARABLE_TYPE>' while executing SQL, > >> how > >> > > >> should I > >> > > >> > >modify the SQL? > >> > > >> > > > >> > > >> > >2, select my_date, col3, col4, sum(f1), sum(f2), sum(f3) from > >> test > >> > > >> group > >> > > >> > >by > >> > > >> > >my_date, col3, col4, order by my_date desc, col3 desc, col4 > >> desc; > >> > > >> > > > >> > > >> > >suppose that the actual row count of the result is GREATER > THAN > >> the > >> > > >> > >default > >> > > >> > >limit 50000; > >> > > >> > > > >> > > >> > >then, the result is: > >> > > >> > > > >> > > >> > >my_date, col3, col4, sum(f1), sum(f2), sum(f3) > >> > > >> > >2013-10-23 > >> > > >> > >2013-10-23 > >> > > >> > >... > >> > > >> > >2013-10-22 > >> > > >> > >2013-10-22 > >> > > >> > >... > >> > > >> > >2013-10-21 > >> > > >> > >2013-10-21 > >> > > >> > >... > >> > > >> > >... > >> > > >> > >... > >> > > >> > >2013-10-01 > >> > > >> > >2013-10-01(note 2013-10-01 is the earliest day of the cube) > >> > > >> > > > >> > > >> > >totally, there are 50000 rows for the result, but it is > obvious > >> > that > >> > > >> the > >> > > >> > >order by and limit doesn't work correctly. > >> > > >> > > > >> > > >> > >3, select my_date, col3, col4, sum(f1), sum(f2), sum(f3) from > >> test > >> > > >> group > >> > > >> > >by > >> > > >> > >my_date, col3, col4, order by my_date desc, col3 desc, col4 > desc > >> > > LIMIT > >> > > >> 100 > >> > > >> > > > >> > > >> > >just adding "LIMIT 100" to the tail of the 2nd query above, > >> then, > >> > the > >> > > >> > >result is: > >> > > >> > > > >> > > >> > >my_date, col3, col4, sum(f1), sum(f2), sum(f3) > >> > > >> > >2013-10-05 > >> > > >> > >2013-10-05 > >> > > >> > >.... > >> > > >> > >2013-10-05 > >> > > >> > > > >> > > >> > >totally, there are 100 rows. all of which are "2013-10-05", > >> > > apparently, > >> > > >> > >the > >> > > >> > >result is not correct as well. > >> > > >> > > > >> > > >> > >4, select my_date, sum(f1), sum(f2), sum(f3) from test group > by > >> > > my_date > >> > > >> > >order by my_date desc LIMIT 100 > >> > > >> > >if I just add "my_date" as "aggreation level" as above, the > >> result > >> > of > >> > > >> the > >> > > >> > >4th query is correct. > >> > > >> > > >> > > >> > > >> > > >> > >> > > > > >> > > > > >> > > > >> > > >> > > >
