If you have a limit of 100, kylin is SUPPOSED to be far more efficient. However there's currently a issue here that might cause limit clause being overlooked (https://issues.apache.org/jira/browse/KYLIN-1936)
I'm working on fixing KYLIN-1936, it will be fixed in 1.5.4. On Thu, Aug 4, 2016 at 11:36 PM, Jason Hale <ja...@koddi.com> wrote: > True, but even if there's a limit of 100, it still has to scan all records? > Perhaps I'm just used to how Postgres handles that as it only scans the > necessary records, not the entire set if it's limited. I can rethink the > way I approach it if that's the case. > > On Thu, Aug 4, 2016 at 10:31 AM, hongbin ma <mahong...@apache.org> wrote: > > > Hi Jason > > > > As Shaofeng explained it's not reasonable to expect sub-second latency if > > you're returning tens of millions of records. You data model is quite > > simple and you don't have costly measure like distinct count etc., so > kylin > > should be performant on normal OLAP queries. > > > > Another advise: if the cardinality of the mandatory dimensions (CHILD_ID > > and SITE_ID) are very high, you might isolate such dimensions into a > > separate "aggregation group", so that 1. queries not touching these > > dimensions can be performant 2. calculate less cuboids. Please refer to > > http://kylin.apache.org/blog/2016/02/18/new-aggregation-group/ > > > > On Thu, Aug 4, 2016 at 11:11 PM, ShaoFeng Shi <shaofeng...@apache.org> > > wrote: > > > > > The log is pretty clear; the cuboid is exact match, but the scan count > is > > > massive: > > > > > > Visiting hbase table KYLIN_RIK9O18H07: cuboid exact match, from 992 to > > 992 > > > Total scan count: 12306477 > > > > > > Please add where condition to narrow down the result set as much as > > > possible; It doesn't make sense for an OLAP query to return millions of > > > records; > > > > > > 2016-08-04 13:05 GMT+08:00 Jason Hale <ja...@koddi.com>: > > > > > > > Sure, see kylin.log below: > > > > > > > > 2016-08-04 00:47:35,839 INFO [http-bio-7070-exec-7] > > > > controller.QueryController:175 : The original query: SELECT > > SUM(clicks) > > > > FROM hpa_reporting2 GROUP BY site_id, child_id, search_type, > hotel_id, > > > > report_date > > > > 2016-08-04 00:47:35,839 INFO [http-bio-7070-exec-7] > > > > service.QueryService:266 : The corrected query: SELECT SUM(clicks) > FROM > > > > hpa_reporting2 GROUP BY site_id, child_id, search_type, hotel_id, > > > > report_date > > > > LIMIT 50000 > > > > 2016-08-04 00:47:35,908 INFO [http-bio-7070-exec-7] > > > routing.QueryRouter:48 > > > > : The project manager's reference is > > > > org.apache.kylin.metadata.project.ProjectManager@3a3735a5 > > > > 2016-08-04 00:47:35,909 INFO [http-bio-7070-exec-7] > > > routing.QueryRouter:60 > > > > : Find candidates by table DEFAULT.HPA_REPORTING2 and > > project=KODDI_DEV : > > > > org.apache.kylin.query.routing.Candidate@51ed1b3b > > > > 2016-08-04 00:47:35,909 INFO [http-bio-7070-exec-7] > > > routing.QueryRouter:49 > > > > : Applying rule: class > > > > org.apache.kylin.query.routing.rules.RemoveUncapableRealizationsRule, > > > > realizations before: [hpa_reporting2_cube_clone(CUBE)], realizations > > > > after: > > > > [hpa_reporting2_cube_clone(CUBE)] > > > > 2016-08-04 00:47:35,910 INFO [http-bio-7070-exec-7] > > > routing.QueryRouter:49 > > > > : Applying rule: class > > > > org.apache.kylin.query.routing.rules.RealizationSortRule, > realizations > > > > before: [hpa_reporting2_cube_clone(CUBE)], realizations after: > > > > [hpa_reporting2_cube_clone(CUBE)] > > > > 2016-08-04 00:47:35,910 INFO [http-bio-7070-exec-7] > > > routing.QueryRouter:72 > > > > : The realizations remaining: [hpa_reporting2_cube_clone(CUBE)] And > the > > > > final chosen one is the first one > > > > 2016-08-04 00:47:35,975 DEBUG [http-bio-7070-exec-7] > > > > enumerator.OLAPEnumerator:107 : query storage... > > > > 2016-08-04 00:47:35,976 INFO [http-bio-7070-exec-7] > > > > v2.CubeStorageQuery:239 : exactAggregation is true > > > > 2016-08-04 00:47:35,976 INFO [http-bio-7070-exec-7] > > > > v2.CubeStorageQuery:357 : Enable limit 50000 > > > > 2016-08-04 00:47:35,977 DEBUG [http-bio-7070-exec-7] > > > > v2.CubeHBaseEndpointRPC:257 : New scanner for current segment > > > > hpa_reporting2_cube_clone[19700101000000_20160828000000] will use > > > > SCAN_FILTER_AGGR_CHECKMEM as endpoint's behavior > > > > 2016-08-04 00:47:35,979 DEBUG [http-bio-7070-exec-7] > > > > v2.CubeHBaseEndpointRPC:313 : Serialized scanRequestBytes 836 bytes, > > > > rawScanBytesString 56 bytes > > > > 2016-08-04 00:47:35,979 INFO [http-bio-7070-exec-7] > > > > v2.CubeHBaseEndpointRPC:315 : The scan 31b2dd4c for segment > > > > hpa_reporting2_cube_clone[19700101000000_20160828000000] is as below > > with > > > > 1 > > > > separate raw scans, shard part of start/end key is set to 0 > > > > 2016-08-04 00:47:35,980 INFO [http-bio-7070-exec-7] > > v2.CubeHBaseRPC:271 > > > : > > > > Visiting hbase table KYLIN_RIK9O18H07: cuboid exact match, from 992 > to > > > 992 > > > > Start: > > > > \x00\x00\x00\x00\x00\x00\x00\x00\x03\xE0\x00\x00\x00\x00\ > > > > x00\x00\x00\x00\x00 > > > > (\x00\x00\x00\x00\x00\x00\x00\x00\x03\xE0\x00\x00\x00\x00\ > > > > x00\x00\x00\x00\x00) > > > > Stop: > > > > \x00\x00\x00\x00\x00\x00\x00\x00\x03\xE0\xFF\xFF\xFF\xFF\ > > > > xFF\xFF\xFF\xFF\xFF\x00 > > > > (\x00\x00\x00\x00\x00\x00\x00\x00\x03\xE0\xFF\xFF\xFF\xFF\ > > > > xFF\xFF\xFF\xFF\xFF\x00), > > > > No Fuzzy Key > > > > 2016-08-04 00:47:35,981 DEBUG [http-bio-7070-exec-7] > > > > v2.CubeHBaseEndpointRPC:320 : Submitting rpc to 1 shards starting > from > > > > shard 2, scan range count 1 > > > > 2016-08-04 00:47:35,981 INFO [http-bio-7070-exec-7] > > > > v2.CubeHBaseEndpointRPC:103 : Timeout for ExpectedSizeIterator is: > > 99000 > > > > 2016-08-04 00:47:35,981 DEBUG [http-bio-7070-exec-7] > > > > enumerator.OLAPEnumerator:127 : return TupleIterator... > > > > 2016-08-04 00:47:52,773 INFO [pool-6-thread-1] > > > v2.CubeHBaseEndpointRPC:351 > > > > : <sub-thread for GTScanRequest 31b2dd4c> Endpoint RPC returned from > > > HTable > > > > KYLIN_RIK9O18H07 Shard > > > > \x4B\x59\x4C\x49\x4E\x5F\x52\x49\x4B\x39\x4F\x31\x38\x48\ > > > > x30\x37\x2C\x00\x02\x2C\x31\x34\x37\x30\x31\x35\x35\x33\ > > > > x31\x34\x39\x33\x37\x2E\x61\x33\x61\x35\x34\x37\x39\x61\ > > > > x32\x63\x37\x61\x61\x64\x30\x36\x33\x66\x30\x33\x64\x63\ > > > > x34\x65\x31\x30\x36\x33\x61\x33\x61\x37\x2E > > > > on host: ip-10-0-0-157.ec2.internal.Total scanned row: 12306477. > Total > > > > filtered/aggred row: 0. Time elapsed in EP: 16562(ms). Server CPU > > usage: > > > > 0.24348086721950246, server physical mem left: 7.195234304E9, server > > swap > > > > mem left:0.0.Etc message: start latency: 15@1,agg done@13760 > ,compress > > > > done@16562,server stats done@16562, > > > > debugGitTag:cf4d2940b67d622eacd2ac9a913b221091a35c2e;.Normal > Complete: > > > > true. > > > > 2016-08-04 00:47:54,068 DEBUG [pool-6-thread-1] > > util.CompressionUtils:67 > > > : > > > > Original: 46465726 bytes. Decompressed: 150553629 bytes. Time: 1294 > > > > 2016-08-04 00:48:29,303 INFO [pool-4-thread-1] > > > > threadpool.DefaultScheduler:106 : Job Fetcher: 0 running, 0 actual > > > > running, > > > > 0 ready, 12 others > > > > 2016-08-04 00:48:31,990 INFO [http-bio-7070-exec-7] > > > > service.QueryService:399 : Scan count for each storageContext: > > 12306477, > > > > 2016-08-04 00:48:31,991 INFO [http-bio-7070-exec-7] > > > > controller.QueryController:197 : Stats of SQL response: isException: > > > false, > > > > duration: 56152, total scan count 12306477 > > > > 2016-08-04 00:48:32,000 WARN [http-bio-7070-exec-7] > > > > sizeof.ObjectGraphWalker:209 : The configured limit of 1,000 object > > > > references was reached while attempting to calculate the size of the > > > object > > > > graph. Severe performance degradation could occur if the sizing > > operation > > > > continues. This can be avoided by setting the CacheManger or Cache > > > > <sizeOfPolicy> elements maxDepthExceededBehavior to "abort" or adding > > > stop > > > > points with @IgnoreSizeOf annotations. If performance degradation is > > NOT > > > an > > > > issue at the configured limit, raise the limit value using the > > > CacheManager > > > > or Cache <sizeOfPolicy> elements maxDepth attribute. For more > > > information, > > > > see the Ehcache configuration documentation. > > > > 2016-08-04 00:48:32,091 INFO [http-bio-7070-exec-7] > > > > service.QueryService:250 : > > > > ==========================[QUERY]=============================== > > > > SQL: SELECT SUM(clicks) FROM hpa_reporting2 GROUP BY site_id, > child_id, > > > > search_type, hotel_id, report_date > > > > User: ADMIN > > > > Success: true > > > > Duration: 56.152 > > > > Project: koddi_dev > > > > Realization Names: [hpa_reporting2_cube_clone] > > > > Cuboid Ids: [992] > > > > Total scan count: 12306477 > > > > Result row count: 50000 > > > > Accept Partial: true > > > > Is Partial Result: false > > > > Hit Exception Cache: false > > > > Storage cache used: false > > > > Message: null > > > > ==========================[QUERY]=============================== > > > > > > > > On Wed, Aug 3, 2016 at 8:38 PM, ShaoFeng Shi <shaofeng...@apache.org > > > > > > wrote: > > > > > > > > > Hi Jason, could you please provide the full log since sending query > > to > > > > and > > > > > getting result back? The key information is which cuboid is used > for > > > the > > > > > query, cuboid exact match or fuzzy match, how many records be > scanned > > > and > > > > > how long it tooks; Thanks. > > > > > > > > > > 2016-08-03 23:19 GMT+08:00 Jason Hale <ja...@koddi.com>: > > > > > > > > > > > Yes, it would have to do post-aggregation in that case, but the > > > strange > > > > > > thing is that query was running fast (about 1 second), while > > queries > > > > with > > > > > > more dimensions, such as "SELECT SUM(clicks) FROM reporting GROUP > > BY > > > > > > site_id, child_id, report_date, hotel_id". This query will take > > about > > > > 106 > > > > > > seconds, but it shouldn't need to do any post-aggregation so I > > would > > > > > think > > > > > > it should return much quicker than that from the respective > cuboid. > > > > > > > > > > > > Here's the explain plan: > > > > > > OLAPToEnumerableConverter > > > > > > OLAPProjectRel(EXPR$0=[$4]) > > > > > > OLAPAggregateRel(group=[{0, 1, 2, 3}], EXPR$0=[SUM($4)]) > > > > > > OLAPProjectRel(SITE_ID=[$9], CHILD_ID=[$3], REPORT_DATE=[$0], > > > > > > HOTEL_ID=[$2], CLICKS=[$10]) > > > > > > OLAPTableScan(table=[[DEFAULT, HPA_REPORTING2]], fields=[[0, 1, > 2, > > 3, > > > > 4, > > > > > 5, > > > > > > 6, 7, 8, 9, 10, 11]]) > > > > > > > > > > > > On Tue, Aug 2, 2016 at 7:46 PM, ShaoFeng Shi < > > shaofeng...@apache.org > > > > > > > > > > wrote: > > > > > > > > > > > > > In the cube definition, you defined "SITE_ID", "CHILD_ID" as > > > > > "Mandatory" > > > > > > > dimension, which means they will not be aggregated in cube > build > > > > phase > > > > > > for > > > > > > > all combinations. > > > > > > > > > > > > > > So when you run a query like "SELECT SUM(clicks) FROM > reporting > > > > GROUP > > > > > BY > > > > > > > search_type", Kylin will use the combination "SITE_ID" + > > > "CHILD_ID" > > > > + > > > > > > > "SEARCH_TYPE" to serve, there will be post-aggregation in > > runtime; > > > > The > > > > > > > performance is much depent on the cardinality of "SITE_ID" and > > > > > > "CHILD_ID". > > > > > > > > > > > > > > > > > > > > > 2016-08-02 23:08 GMT+08:00 Jason Hale <ja...@koddi.com>: > > > > > > > > > > > > > > > I've looked over the optimization options before, but did not > > > > notice > > > > > > the > > > > > > > > rowkey ordering. I can try this and see if this helps me. > This > > is > > > > the > > > > > > > only > > > > > > > > thing I see that I can attempt to optimize further in the > > design, > > > > but > > > > > > > I'll > > > > > > > > provide my cube design below. I only have one measure to keep > > it > > > > > > simple: > > > > > > > > > > > > > > > > { > > > > > > > > "uuid": "4090b854-8f0c-4288-bd73-fc50238a6030", > > > > > > > > "version": "1.5.2", > > > > > > > > "name": "hpa_reporting2_cube", > > > > > > > > "description": "", > > > > > > > > "dimensions": [ > > > > > > > > { > > > > > > > > "name": "DEFAULT.HPA_REPORTING2.REPORT_DATE", > > > > > > > > "table": "DEFAULT.HPA_REPORTING2", > > > > > > > > "column": "REPORT_DATE", > > > > > > > > "derived": null > > > > > > > > }, > > > > > > > > { > > > > > > > > "name": "DEFAULT.HPA_REPORTING2.SEARCH_TYPE", > > > > > > > > "table": "DEFAULT.HPA_REPORTING2", > > > > > > > > "column": "SEARCH_TYPE", > > > > > > > > "derived": null > > > > > > > > }, > > > > > > > > { > > > > > > > > "name": "DEFAULT.HPA_REPORTING2.HOTEL_ID", > > > > > > > > "table": "DEFAULT.HPA_REPORTING2", > > > > > > > > "column": "HOTEL_ID", > > > > > > > > "derived": null > > > > > > > > }, > > > > > > > > { > > > > > > > > "name": "DEFAULT.HPA_REPORTING2.CHILD_ID", > > > > > > > > "table": "DEFAULT.HPA_REPORTING2", > > > > > > > > "column": "CHILD_ID", > > > > > > > > "derived": null > > > > > > > > }, > > > > > > > > { > > > > > > > > "name": "DEFAULT.HPA_REPORTING2.COUNTRY", > > > > > > > > "table": "DEFAULT.HPA_REPORTING2", > > > > > > > > "column": "COUNTRY", > > > > > > > > "derived": null > > > > > > > > }, > > > > > > > > { > > > > > > > > "name": "DEFAULT.HPA_REPORTING2.DEVICE_TYPE", > > > > > > > > "table": "DEFAULT.HPA_REPORTING2", > > > > > > > > "column": "DEVICE_TYPE", > > > > > > > > "derived": null > > > > > > > > }, > > > > > > > > { > > > > > > > > "name": "DEFAULT.HPA_REPORTING2.STAY_LENGTH", > > > > > > > > "table": "DEFAULT.HPA_REPORTING2", > > > > > > > > "column": "STAY_LENGTH", > > > > > > > > "derived": null > > > > > > > > }, > > > > > > > > { > > > > > > > > "name": "DEFAULT.HPA_REPORTING2.TRUE_RANK_AG", > > > > > > > > "table": "DEFAULT.HPA_REPORTING2", > > > > > > > > "column": "TRUE_RANK_AG", > > > > > > > > "derived": null > > > > > > > > }, > > > > > > > > { > > > > > > > > "name": "DEFAULT.HPA_REPORTING2.ROOM_BUNDLE", > > > > > > > > "table": "DEFAULT.HPA_REPORTING2", > > > > > > > > "column": "ROOM_BUNDLE", > > > > > > > > "derived": null > > > > > > > > }, > > > > > > > > { > > > > > > > > "name": "DEFAULT.HPA_REPORTING2.SITE_ID", > > > > > > > > "table": "DEFAULT.HPA_REPORTING2", > > > > > > > > "column": "SITE_ID", > > > > > > > > "derived": null > > > > > > > > } > > > > > > > > ], > > > > > > > > "measures": [ > > > > > > > > { > > > > > > > > "name": "_COUNT_", > > > > > > > > "function": { > > > > > > > > "expression": "COUNT", > > > > > > > > "parameter": { > > > > > > > > "type": "constant", > > > > > > > > "value": "1", > > > > > > > > "next_parameter": null > > > > > > > > }, > > > > > > > > "returntype": "bigint" > > > > > > > > }, > > > > > > > > "dependent_measure_ref": null > > > > > > > > }, > > > > > > > > { > > > > > > > > "name": "CLICKS", > > > > > > > > "function": { > > > > > > > > "expression": "SUM", > > > > > > > > "parameter": { > > > > > > > > "type": "column", > > > > > > > > "value": "CLICKS", > > > > > > > > "next_parameter": null > > > > > > > > }, > > > > > > > > "returntype": "decimal" > > > > > > > > }, > > > > > > > > "dependent_measure_ref": null > > > > > > > > } > > > > > > > > ], > > > > > > > > "rowkey": { > > > > > > > > "rowkey_columns": [ > > > > > > > > { > > > > > > > > "column": "REPORT_DATE", > > > > > > > > "encoding": "dict", > > > > > > > > "isShardBy": false > > > > > > > > }, > > > > > > > > { > > > > > > > > "column": "SEARCH_TYPE", > > > > > > > > "encoding": "dict", > > > > > > > > "isShardBy": false > > > > > > > > }, > > > > > > > > { > > > > > > > > "column": "HOTEL_ID", > > > > > > > > "encoding": "dict", > > > > > > > > "isShardBy": false > > > > > > > > }, > > > > > > > > { > > > > > > > > "column": "CHILD_ID", > > > > > > > > "encoding": "dict", > > > > > > > > "isShardBy": false > > > > > > > > }, > > > > > > > > { > > > > > > > > "column": "COUNTRY", > > > > > > > > "encoding": "dict", > > > > > > > > "isShardBy": false > > > > > > > > }, > > > > > > > > { > > > > > > > > "column": "DEVICE_TYPE", > > > > > > > > "encoding": "dict", > > > > > > > > "isShardBy": false > > > > > > > > }, > > > > > > > > { > > > > > > > > "column": "STAY_LENGTH", > > > > > > > > "encoding": "dict", > > > > > > > > "isShardBy": false > > > > > > > > }, > > > > > > > > { > > > > > > > > "column": "TRUE_RANK_AG", > > > > > > > > "encoding": "dict", > > > > > > > > "isShardBy": false > > > > > > > > }, > > > > > > > > { > > > > > > > > "column": "ROOM_BUNDLE", > > > > > > > > "encoding": "dict", > > > > > > > > "isShardBy": false > > > > > > > > }, > > > > > > > > { > > > > > > > > "column": "SITE_ID", > > > > > > > > "encoding": "dict", > > > > > > > > "isShardBy": false > > > > > > > > } > > > > > > > > ] > > > > > > > > }, > > > > > > > > "signature": "KixlKWxevr6mO+UlSaR5ig==", > > > > > > > > "last_modified": 1470070273935, > > > > > > > > "model_name": "hpa_reporting_model2", > > > > > > > > "null_string": null, > > > > > > > > "hbase_mapping": { > > > > > > > > "column_family": [ > > > > > > > > { > > > > > > > > "name": "F1", > > > > > > > > "columns": [ > > > > > > > > { > > > > > > > > "qualifier": "M", > > > > > > > > "measure_refs": [ > > > > > > > > "_COUNT_", > > > > > > > > "CLICKS" > > > > > > > > ] > > > > > > > > } > > > > > > > > ] > > > > > > > > } > > > > > > > > ] > > > > > > > > }, > > > > > > > > "aggregation_groups": [ > > > > > > > > { > > > > > > > > "includes": [ > > > > > > > > "REPORT_DATE", > > > > > > > > "SEARCH_TYPE", > > > > > > > > "HOTEL_ID", > > > > > > > > "CHILD_ID", > > > > > > > > "COUNTRY", > > > > > > > > "DEVICE_TYPE", > > > > > > > > "STAY_LENGTH", > > > > > > > > "TRUE_RANK_AG", > > > > > > > > "ROOM_BUNDLE", > > > > > > > > "SITE_ID" > > > > > > > > ], > > > > > > > > "select_rule": { > > > > > > > > "hierarchy_dims": [], > > > > > > > > "mandatory_dims": [ > > > > > > > > "SITE_ID", > > > > > > > > "CHILD_ID" > > > > > > > > ], > > > > > > > > "joint_dims": [ > > > > > > > > [ > > > > > > > > "ROOM_BUNDLE", > > > > > > > > "TRUE_RANK_AG" > > > > > > > > ] > > > > > > > > ] > > > > > > > > } > > > > > > > > } > > > > > > > > ], > > > > > > > > "notify_list": [], > > > > > > > > "status_need_notify": [ > > > > > > > > "ERROR", > > > > > > > > "DISCARDED", > > > > > > > > "SUCCEED" > > > > > > > > ], > > > > > > > > "partition_date_start": 0, > > > > > > > > "partition_date_end": 3153600000000, > > > > > > > > "auto_merge_time_ranges": [ > > > > > > > > 604800000, > > > > > > > > 2419200000 > > > > > > > > ], > > > > > > > > "retention_range": 0, > > > > > > > > "engine_type": 2, > > > > > > > > "storage_type": 2, > > > > > > > > "override_kylin_properties": {} > > > > > > > > } > > > > > > > > > > > > > > > > On Mon, Aug 1, 2016 at 8:02 PM, ShaoFeng Shi < > > > > shaofeng...@apache.org > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Hi Jason, > > > > > > > > > > > > > > > > > > As Yiming mentioned, the cube design matters for the > > > performance > > > > of > > > > > > > both > > > > > > > > > build and query; please check "Optimize Cube" in the > document > > > web > > > > > > page > > > > > > > > and > > > > > > > > > do optimizaiton as much as possible; > > > > > > > > > > > > > > > > > > Besides, the cluster's capacity and Hadoop configuration is > > > also > > > > an > > > > > > > > > important factor; Try to identify the bottleneck and then > > > > optimize > > > > > or > > > > > > > add > > > > > > > > > capacity. > > > > > > > > > > > > > > > > > > From 1.5 Kylin ships with two cubing algorithm; the steps > > > "Build > > > > > > > > > N-Dimension Cuboid" are the legacy "Layered" cubing > > algorithm; > > > > They > > > > > > > will > > > > > > > > be > > > > > > > > > skipped when Kylin selects to use the new "Fast" cubing > > > > algorithm, > > > > > > > which > > > > > > > > is > > > > > > > > > the "Build Cube" step after them. Please click the hadoop > > link > > > in > > > > > > that > > > > > > > > step > > > > > > > > > to inspect the MR job's statistics; > > > > > > > > > > > > > > > > > > Hope this helps to some extend; > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2016-08-02 8:44 GMT+08:00 Yiming Liu < > > liuyiming....@gmail.com > > > >: > > > > > > > > > > > > > > > > > > > Hi Jason, > > > > > > > > > > > > > > > > > > > > Cube design is the performance key for Kylin, not only > > query, > > > > but > > > > > > > also > > > > > > > > > cube > > > > > > > > > > building process. How to select dimensions, how to define > > the > > > > > > > > > relationship > > > > > > > > > > between dimensions, how to select encode method, how to > > > define > > > > > > > measure, > > > > > > > > > > even how to choose the Hbase key order will have a > > > significant > > > > > > impact > > > > > > > > on > > > > > > > > > > performance. There are quite a few wonderful documents > > > > > introducing > > > > > > > how > > > > > > > > > to > > > > > > > > > > do this, http://kylin.apache.org/docs15/ . > > > > > > > > > > > > > > > > > > > > One more thing, if you could share your cube design, you > > > would > > > > > get > > > > > > > help > > > > > > > > > > easier here. > > > > > > > > > > > > > > > > > > > > 2016-08-02 7:20 GMT+08:00 Jason Hale <ja...@koddi.com>: > > > > > > > > > > > > > > > > > > > > > I'm setting up a test case for a portion of our > dataset, > > to > > > > > > > evaluate > > > > > > > > > > Kylin > > > > > > > > > > > and I'm not seeing the performance that I would expect. > > > > > > > > > > > > > > > > > > > > > > The cube building process is taking about 5-6 hours > with > > > > > > > ~69,000,000 > > > > > > > > > > > records and 10 dimensions. I'm not sure if that's the > > > > expected > > > > > > > build > > > > > > > > > > time, > > > > > > > > > > > but the other problem is the query performance after > > > building > > > > > the > > > > > > > > cube. > > > > > > > > > > > > > > > > > > > > > > All queries were tested with a very simple query (e.g. > > > SELECT > > > > > > > > > SUM(clicks) > > > > > > > > > > > FROM reporting GROUP BY search_type) > > > > > > > > > > > > > > > > > > > > > > Grouping by 1 or 2 dimensions gives me very responsive > > > > queries > > > > > > > > (under 2 > > > > > > > > > > > seconds), but adding more dimensions drastically > > increases > > > > the > > > > > > > query > > > > > > > > > time > > > > > > > > > > > (over 1 minute and it times out through hbase). I would > > > > expect > > > > > > > these > > > > > > > > > > > queries to have all similar query times since they > should > > > > query > > > > > > the > > > > > > > > > > > respective cuboid, so I'm not sure why the performance > > > would > > > > > > > suffer. > > > > > > > > I > > > > > > > > > > > didn't set up any special rules for the cube, but > during > > > the > > > > > > build > > > > > > > > > > process > > > > > > > > > > > it showed all the N-dimension cubes and the log simply > > said > > > > > > > > 'skipped'. > > > > > > > > > > > > > > > > > > > > > > Is there something I'm missing in the configuration? > > > > > > > > > > > > > > > > > > > > > > I have a HDP cluster with 3 nodes and 1 client node on > > > which > > > > > > Kylin > > > > > > > is > > > > > > > > > > > installed. Do I need to adjust the hadoop > configuration. > > > I'm > > > > > > using > > > > > > > > most > > > > > > > > > > of > > > > > > > > > > > the default HDP settings. > > > > > > > > > > > > > > > > > > > > > > What more information can I provide? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > With Warm regards > > > > > > > > > > > > > > > > > > > > Yiming Liu (刘一鸣) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > Best regards, > > > > > > > > > > > > > > > > > > Shaofeng Shi > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > Best regards, > > > > > > > > > > > > > > Shaofeng Shi > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Best regards, > > > > > > > > > > Shaofeng Shi > > > > > > > > > > > > > > > > > > > > > -- > > > Best regards, > > > > > > Shaofeng Shi > > > > > > > > > > > -- > > Regards, > > > > *Bin Mahone | 马洪宾* > > > -- Regards, *Bin Mahone | 马洪宾*