Hi Hongbin, If you see my queries I am doing left join dim tables with fact and if I do a count on it I get correct result. But if I try to select each column in join to get record out I get only one result. Actually my query is not on only select * from fact.. So do you mean to say such queries are not supported in kylin 0.6. Regards Santosh Akhilesh On Fri, 6 Mar 2015 at 8:07 am, Santosh Akhilesh <[email protected]> wrote:
> Hi Hongbin, > Ok, I will add group by and check. But can you please also check why my > dim table user is not being recognised in query. And also I can't query if > I add the column value which is in fact and I have added a measure over it. > I had sent my cube request Json. > Regards > Santosh > On Fri, 6 Mar 2015 at 7:50 am, hongbin ma <[email protected]> wrote: > >> hi santoshakhilesh, >> >> In 0.6 versions, Kylin is only meant for aggregated queries, not detailed >> level queries. Try modifying your query by adding some groupbys. >> >> I believe "select * from factable" will work for your case. >> The reason why you can support "select * from facttable" is that when >> Kylin >> detects such query pattern, it will secretly add all the dimension columns >> and metric columns on fact table to the query and the query will be >> treated >> like: >> >> select dim1,dim2,dim3.... dimN, metric1,metric2...metricN from facttable >> groupby dim1, dim2, dim3,,,dimN >> >> So the result of "select * from factable" is aggregated if two rows in >> fact >> table share the same dimension values. >> >> In 0.7 version which we introduced a new module called "inverted index", >> with data organized in "inverted index" Kylin can better answer detailed >> level queries (By MPP instead of cubing). However this is still under >> development and not ready for community use. >> >> thanks >> hongbin >> >> >> On Thu, Mar 5, 2015 at 11:34 PM, Santoshakhilesh < >> [email protected]> wrote: >> >> > Hi All , >> > >> > I am facing couple of issue regarding my queries in 0.6 >> > >> > 1. Query result in web ui is not correct , following return only one >> > record , if I run count(*) on same join >> > result is 1000 which is correct >> > SELECT >> > AREA.NAME >> > ,AREA.COUNTRY >> > ,AREA.PROVINCE >> > ,AREA.CITY >> > ,RESOURCEFACT.RESOURCEID >> > ,RESOURCEFACT.INDICATORID >> > ,RESOURCEFACT.COLLECTIONTIME >> > ,RESOURCEFACT.PERIOD >> > ,RESOURCEFACT.AREAID >> > ,RESOURCEFACT.CUSTOMERID >> > FROM RESOURCEFACT >> > LEFT JOIN AREA >> > ON RESOURCEFACT.AREAID = AREA.AREAID >> > LEFT JOIN KPI >> > ON RESOURCEFACT.INDICATORID = KPI.INDICATORID >> > >> > 2. The sample query on cube wizard is following; I have another table >> User >> > when I run this query after cube build Kylin throws exception that it >> cant >> > find user table. even select count(*) on User table fails. But the >> table is >> > there in hive. RESOURCEFACT.VALUE this column cant be recognized by sql >> > query on web ui. >> > >> > SELECT >> > AREA.NAME >> > ,AREA.COUNTRY >> > ,AREA.PROVINCE >> > ,AREA.CITY >> > ,RESOURCEFACT.RESOURCEID >> > ,RESOURCEFACT.INDICATORID >> > ,RESOURCEFACT.COLLECTIONTIME >> > ,RESOURCEFACT.PERIOD >> > ,RESOURCEFACT.VALUE >> > ,RESOURCEFACT.AREAID >> > ,RESOURCEFACT.CUSTOMERID >> > FROM RESOURCEFACT >> > LEFT JOIN AREA >> > ON RESOURCEFACT.AREAID = AREA.AREAID >> > LEFT JOIN USER >> > ON RESOURCEFACT.CUSTOMERID = USER.USERID >> > LEFT JOIN KPI >> > ON RESOURCEFACT.INDICATORID = KPI.INDICATORID >> > >> > MY Cube Json is as below. >> > { >> > "uuid": "95204f18-24f7-43e5-bead-368900f74f9c", >> > "name": "secondcube", >> > "description": "", >> > "dimensions": [ >> > { >> > "id": 1, >> > "name": "RESOURCEGROUP", >> > "join": { >> > "type": "left", >> > "primary_key": [ >> > "AREAID" >> > ], >> > "foreign_key": [ >> > "AREAID" >> > ] >> > }, >> > "hierarchy": [ >> > { >> > "level": "1", >> > "column": "NAME" >> > }, >> > { >> > "level": "2", >> > "column": "COUNTRY" >> > }, >> > { >> > "level": "3", >> > "column": "PROVINCE" >> > }, >> > { >> > "level": "4", >> > "column": "CITY" >> > } >> > ], >> > "table": "AREA", >> > "column": null, >> > "datatype": null, >> > "derived": null >> > }, >> > { >> > "id": 2, >> > "name": "RESOURCEFACT.RESOURCEID", >> > "join": null, >> > "hierarchy": null, >> > "table": "RESOURCEFACT", >> > "column": "RESOURCEID", >> > "datatype": null, >> > "derived": null >> > }, >> > { >> > "id": 3, >> > "name": "RESOURCEFACT.INDICATORID", >> > "join": null, >> > "hierarchy": null, >> > "table": "RESOURCEFACT", >> > "column": "INDICATORID", >> > "datatype": null, >> > "derived": null >> > }, >> > { >> > "id": 4, >> > "name": "RESOURCEFACT.COLLECTIONTIME", >> > "join": null, >> > "hierarchy": null, >> > "table": "RESOURCEFACT", >> > "column": "COLLECTIONTIME", >> > "datatype": null, >> > "derived": null >> > }, >> > { >> > "id": 5, >> > "name": "RESOURCEFACT.PERIOD", >> > "join": null, >> > "hierarchy": null, >> > "table": "RESOURCEFACT", >> > "column": "PERIOD", >> > "datatype": null, >> > "derived": null >> > }, >> > { >> > "id": 6, >> > "name": "RESOURCEFACT.VALUE", >> > "join": null, >> > "hierarchy": null, >> > "table": "RESOURCEFACT", >> > "column": "VALUE", >> > "datatype": null, >> > "derived": null >> > }, >> > { >> > "id": 7, >> > "name": "RESOURCEFACT.AREAID", >> > "join": null, >> > "hierarchy": null, >> > "table": "RESOURCEFACT", >> > "column": "AREAID", >> > "datatype": null, >> > "derived": null >> > }, >> > { >> > "id": 8, >> > "name": "RESOURCEFACT.CUSTOMERID", >> > "join": null, >> > "hierarchy": null, >> > "table": "RESOURCEFACT", >> > "column": "CUSTOMERID", >> > "datatype": null, >> > "derived": null >> > }, >> > { >> > "id": 9, >> > "name": "USER_DERIVED", >> > "join": { >> > "type": "left", >> > "primary_key": [ >> > "USERID" >> > ], >> > "foreign_key": [ >> > "CUSTOMERID" >> > ] >> > }, >> > "hierarchy": null, >> > "table": "USER", >> > "column": "{FK}", >> > "datatype": null, >> > "derived": [ >> > "USERID", >> > "NAME" >> > ] >> > }, >> > { >> > "id": 10, >> > "name": "KPI_DERIVED", >> > "join": { >> > "type": "left", >> > "primary_key": [ >> > "INDICATORID" >> > ], >> > "foreign_key": [ >> > "INDICATORID" >> > ] >> > }, >> > "hierarchy": null, >> > "table": "KPI", >> > "column": "{FK}", >> > "datatype": null, >> > "derived": [ >> > "INDICATORID", >> > "NAME" >> > ] >> > }, >> > { >> > "id": 11, >> > "name": "AREA_DERIVED", >> > "join": { >> > "type": "left", >> > "primary_key": [ >> > "AREAID" >> > ], >> > "foreign_key": [ >> > "AREAID" >> > ] >> > }, >> > "hierarchy": null, >> > "table": "AREA", >> > "column": "{FK}", >> > "datatype": null, >> > "derived": [ >> > "AREAID" >> > ] >> > } >> > ], >> > "measures": [ >> > { >> > "id": 1, >> > "name": "_COUNT_", >> > "function": { >> > "expression": "COUNT", >> > "parameter": { >> > "type": "constant", >> > "value": "1" >> > }, >> > "returntype": "bigint" >> > }, >> > "dependent_measure_ref": null >> > }, >> > { >> > "id": 2, >> > "name": "SUMVALUE", >> > "function": { >> > "expression": "SUM", >> > "parameter": { >> > "type": "column", >> > "value": "VALUE" >> > }, >> > "returntype": "bigint" >> > }, >> > "dependent_measure_ref": null >> > } >> > ], >> > "rowkey": { >> > "rowkey_columns": [ >> > { >> > "column": "NAME", >> > "length": 0, >> > "dictionary": "true", >> > "mandatory": false >> > }, >> > { >> > "column": "COUNTRY", >> > "length": 0, >> > "dictionary": "true", >> > "mandatory": false >> > }, >> > { >> > "column": "PROVINCE", >> > "length": 0, >> > "dictionary": "true", >> > "mandatory": false >> > }, >> > { >> > "column": "CITY", >> > "length": 0, >> > "dictionary": "true", >> > "mandatory": false >> > }, >> > { >> > "column": "RESOURCEID", >> > "length": 0, >> > "dictionary": "true", >> > "mandatory": false >> > }, >> > { >> > "column": "INDICATORID", >> > "length": 0, >> > "dictionary": "true", >> > "mandatory": false >> > }, >> > { >> > "column": "COLLECTIONTIME", >> > "length": 0, >> > "dictionary": "true", >> > "mandatory": false >> > }, >> > { >> > "column": "PERIOD", >> > "length": 0, >> > "dictionary": "true", >> > "mandatory": false >> > }, >> > { >> > "column": "VALUE", >> > "length": 0, >> > "dictionary": "true", >> > "mandatory": false >> > }, >> > { >> > "column": "AREAID", >> > "length": 0, >> > "dictionary": "true", >> > "mandatory": false >> > }, >> > { >> > "column": "CUSTOMERID", >> > "length": 0, >> > "dictionary": "true", >> > "mandatory": false >> > } >> > ], >> > "aggregation_groups": [ >> > [ >> > "RESOURCEID", >> > "INDICATORID", >> > "COLLECTIONTIME", >> > "PERIOD", >> > "VALUE", >> > "AREAID", >> > "CUSTOMERID" >> > ], >> > [ >> > "NAME", >> > "COUNTRY", >> > "PROVINCE", >> > "CITY" >> > ] >> > ] >> > }, >> > "signature": "1CSqztliEqN4l0cOPrkFdA==", >> > "capacity": "MEDIUM", >> > "last_modified": 1425587998627, >> > "fact_table": "RESOURCEFACT", >> > "null_string": null, >> > "filter_condition": null, >> > "cube_partition_desc": { >> > "partition_date_column": null, >> > "partition_date_start": 0, >> > "cube_partition_type": "APPEND" >> > }, >> > "hbase_mapping": { >> > "column_family": [ >> > { >> > "name": "F1", >> > "columns": [ >> > { >> > "qualifier": "M", >> > "measure_refs": [ >> > "_COUNT_", >> > "SUMVALUE" >> > ] >> > } >> > ] >> > } >> > ] >> > }, >> > "notify_list": [] >> > } >> > >> > Regards, >> > Santosh Akhilesh >> > Bangalore R&D >> > HUAWEI TECHNOLOGIES CO.,LTD. >> > >> > www.huawei.com >> > >> > ------------------------------------------------------------ >> ------------------------------------------------------------------------- >> > This e-mail and its attachments contain confidential information from >> > HUAWEI, which >> > is intended only for the person or entity whose address is listed above. >> > Any use of the >> > information contained herein in any way (including, but not limited to, >> > total or partial >> > disclosure, reproduction, or dissemination) by persons other than the >> > intended >> > recipient(s) is prohibited. If you receive this e-mail in error, please >> > notify the sender by >> > phone or email immediately and delete it! >> > >> > ________________________________________ >> > From: Santoshakhilesh >> > Sent: Thursday, March 05, 2015 8:36 PM >> > To: [email protected] >> > Subject: RE: get wrong result for the query(both in kyin-0.71 and >> > kylin-0.6x) >> > >> > hi dong , >> > I am facing same issue when I run a count(*) on a query I get correct >> > result >> > But when I do select * on same query I get only 1 result , but there >> are >> > 1000 records as shown by client. >> > >> > I am not getting following notifications , I am using 0.6.5 >> > >> > *Note: Current results are partial, please click 'Show All' button to >> get >> > all results.* >> > Click the "Show All" button to disable the "AcceptPartialResults" mode, >> and >> > you'll get a right result. >> > >> > >> > Regards, >> > Santosh Akhilesh >> > Bangalore R&D >> > HUAWEI TECHNOLOGIES CO.,LTD. >> > >> > www.huawei.com >> > >> > ------------------------------------------------------------ >> ------------------------------------------------------------------------- >> > This e-mail and its attachments contain confidential information from >> > HUAWEI, which >> > is intended only for the person or entity whose address is listed above. >> > Any use of the >> > information contained herein in any way (including, but not limited to, >> > total or partial >> > disclosure, reproduction, or dissemination) by persons other than the >> > intended >> > recipient(s) is prohibited. If you receive this e-mail in error, please >> > notify the sender by >> > phone or email immediately and delete it! >> > >> > ________________________________________ >> > From: hongbin ma [[email protected]] >> > Sent: Thursday, March 05, 2015 11:55 AM >> > To: [email protected] >> > Subject: Re: get wrong result for the query(both in kyin-0.71 and >> > kylin-0.6x) >> > >> > hi dong, >> > >> > we looked into it, here's the reason: >> > >> > By default if you're making queries on the web client, a mode called >> > "AcceptPartialResults" is enabled​, this is a protection mechanism that >> > will only return part of the results to reduce server overhead. >> Honestly it >> > might hurt the correctness of order by queries. >> > >> > If you're seeking 100% correctness, after running the query you will >> find a >> > notification: >> > *Note: Current results are partial, please click 'Show All' button to >> get >> > all results.* >> > Click the "Show All" button to disable the "AcceptPartialResults" mode, >> and >> > you'll get a right result. >> > >> > Notice "AcceptPartialResults" is only enabled by default at web client, >> > you'll not meet such problems if you're using JDBC, ODBC or standard >> REST >> > API. >> > >> > thanks >> > hongbin >> > >> > >> > On Thu, Mar 5, 2015 at 11:33 AM, dong wang <[email protected]> >> wrote: >> > >> > > Sorry hongbin, it's my fault, for the 2 question, the row count is >> LESS >> > > THAN 50000 >> > > >> > > 2015-03-05 11:28 GMT+08:00 dong wang <[email protected]>: >> > > >> > > > HI hongbin, >> > > > 1, yes, as you guessed, the result of the above query is CORRECT >> if we >> > > > use ASC order, thus, the issue only occurs when DESC order. >> > > > 2, the total row count of the result is GREATER THAN 50000 >> definitely, >> > > > even much more >> > > > >> > > > 2015-03-05 11:06 GMT+08:00 hongbin ma <[email protected]>: >> > > > >> > > >> for #2 to #4, will Kylin return correct answers if order by >> ascending >> > > >> order? >> > > >> And what it the total result count of #4? >> > > >> >> > > >> On Thu, Mar 5, 2015 at 10:56 AM, Zhou, Qianhao <[email protected]> >> > > wrote: >> > > >> >> > > >> > #1 when add condition for date type, you should add date before >> the >> > > time >> > > >> > string >> > > >> > >> > > >> > the sql should be like: >> > > >> > select my_date, sum(f1), sum(f2), sum(f3) from test where >> > > >> > my_date >= date'2013-01-01' and my_date <= date'2015-01-31' >> > > >> > >> > > >> > >> > > >> > >> > > >> > Best Regard >> > > >> > Zhou QianHao >> > > >> > >> > > >> > >> > > >> > >> > > >> > >> > > >> > >> > > >> > On 3/5/15, 10:40 AM, "dong wang" <[email protected]> wrote: >> > > >> > >> > > >> > >Hi all, I have experienced the wrong result for quite a long >> time >> > > since >> > > >> > >kylin-0.6x, thus, really hope that someone can do a help. >> > > >> > > >> > > >> > >CUBE: >> > > >> > >1, the cube is based on just one big fact table. there are 10 >> > > >> attributes, >> > > >> > >and 3 metrics. >> > > >> > >for exmaple: my_date, col2, col3, col4, col5, col6, col7, col8, >> > col9, >> > > >> > >col10, sum(f1), sum(f2), sum(f2) >> > > >> > > >> > > >> > >2, my_date, col2 is normal columns, and my_date is the partition >> > > >> > >attribute, >> > > >> > >which is date type in HIVE, >> > > >> > > >> > > >> > >3, (col3, col4), (col5, col6), (col7, col8), (col9, col10) are 4 >> > > >> > >hirerarchies, and col3, col5, col7, col9 are parents of its >> > hirarchy >> > > >> > >respectively. >> > > >> > > >> > > >> > >4, the aggregation groups are: my_date, col2, col3, col4, col5, >> > col6, >> > > >> > >col7, >> > > >> > >col8, col9, col10 >> > > >> > > >> > > >> > >5, the rowkeys are: my_date, col2, col3, col4, col5, col6, col7, >> > > col8, >> > > >> > >col9, col10 >> > > >> > > >> > > >> > > >> > > >> > >NOTE THAT: in the cube, the range of my_date is between >> 2013-10-01 >> > > and >> > > >> > >2015-03-04 >> > > >> > > >> > > >> > > >> > > >> > >2 kinds of issue as below: >> > > >> > > >> > > >> > >QUERY: >> > > >> > >1, select my_date, sum(f1), sum(f2), sum(f3) from test where >> > > >> > >my_date>='2013-01-01' and my_date<= '2015-01-31' >> > > >> > >note "my_date" is HIVE date type. and the error: Cannot apply >> '>=' >> > to >> > > >> > >arguments of type '<DATE> >= <CHAR(10)>'. Supported form(s): >> > > >> > >'<COMPARABLE_TYPE> >= <COMPARABLE_TYPE>' while executing SQL, >> how >> > > >> should I >> > > >> > >modify the SQL? >> > > >> > > >> > > >> > >2, select my_date, col3, col4, sum(f1), sum(f2), sum(f3) from >> test >> > > >> group >> > > >> > >by >> > > >> > >my_date, col3, col4, order by my_date desc, col3 desc, col4 >> desc; >> > > >> > > >> > > >> > >suppose that the actual row count of the result is GREATER THAN >> the >> > > >> > >default >> > > >> > >limit 50000; >> > > >> > > >> > > >> > >then, the result is: >> > > >> > > >> > > >> > >my_date, col3, col4, sum(f1), sum(f2), sum(f3) >> > > >> > >2013-10-23 >> > > >> > >2013-10-23 >> > > >> > >... >> > > >> > >2013-10-22 >> > > >> > >2013-10-22 >> > > >> > >... >> > > >> > >2013-10-21 >> > > >> > >2013-10-21 >> > > >> > >... >> > > >> > >... >> > > >> > >... >> > > >> > >2013-10-01 >> > > >> > >2013-10-01(note 2013-10-01 is the earliest day of the cube) >> > > >> > > >> > > >> > >totally, there are 50000 rows for the result, but it is obvious >> > that >> > > >> the >> > > >> > >order by and limit doesn't work correctly. >> > > >> > > >> > > >> > >3, select my_date, col3, col4, sum(f1), sum(f2), sum(f3) from >> test >> > > >> group >> > > >> > >by >> > > >> > >my_date, col3, col4, order by my_date desc, col3 desc, col4 desc >> > > LIMIT >> > > >> 100 >> > > >> > > >> > > >> > >just adding "LIMIT 100" to the tail of the 2nd query above, >> then, >> > the >> > > >> > >result is: >> > > >> > > >> > > >> > >my_date, col3, col4, sum(f1), sum(f2), sum(f3) >> > > >> > >2013-10-05 >> > > >> > >2013-10-05 >> > > >> > >.... >> > > >> > >2013-10-05 >> > > >> > > >> > > >> > >totally, there are 100 rows. all of which are "2013-10-05", >> > > apparently, >> > > >> > >the >> > > >> > >result is not correct as well. >> > > >> > > >> > > >> > >4, select my_date, sum(f1), sum(f2), sum(f3) from test group by >> > > my_date >> > > >> > >order by my_date desc LIMIT 100 >> > > >> > >if I just add "my_date" as "aggreation level" as above, the >> result >> > of >> > > >> the >> > > >> > >4th query is correct. >> > > >> > >> > > >> > >> > > >> >> > > > >> > > > >> > > >> > >> >
