Yes, it would have to do post-aggregation in that case, but the strange
thing is that query was running fast (about 1 second), while queries with
more dimensions, such as "SELECT SUM(clicks) FROM reporting GROUP BY
site_id, child_id, report_date, hotel_id". This query will take about 106
seconds, but it shouldn't need to do any post-aggregation so I would think
it should return much quicker than that from the respective cuboid.

Here's the explain plan:
OLAPToEnumerableConverter
OLAPProjectRel(EXPR$0=[$4])
OLAPAggregateRel(group=[{0, 1, 2, 3}], EXPR$0=[SUM($4)])
OLAPProjectRel(SITE_ID=[$9], CHILD_ID=[$3], REPORT_DATE=[$0],
HOTEL_ID=[$2], CLICKS=[$10])
OLAPTableScan(table=[[DEFAULT, HPA_REPORTING2]], fields=[[0, 1, 2, 3, 4, 5,
6, 7, 8, 9, 10, 11]])

On Tue, Aug 2, 2016 at 7:46 PM, ShaoFeng Shi <[email protected]> wrote:

> In the cube definition, you defined "SITE_ID", "CHILD_ID" as "Mandatory"
> dimension, which means they will not be aggregated in cube build phase for
> all combinations.
>
> So when you run a query like  "SELECT SUM(clicks) FROM reporting GROUP BY
> search_type", Kylin will use the combination  "SITE_ID" + "CHILD_ID" +
> "SEARCH_TYPE" to serve, there will be post-aggregation in runtime; The
> performance is much depent on the cardinality of "SITE_ID" and "CHILD_ID".
>
>
> 2016-08-02 23:08 GMT+08:00 Jason Hale <[email protected]>:
>
> > I've looked over the optimization options before, but did not notice the
> > rowkey ordering. I can try this and see if this helps me. This is the
> only
> > thing I see that I can attempt to optimize further in the design, but
> I'll
> > provide my cube design below. I only have one measure to keep it simple:
> >
> > {
> >   "uuid": "4090b854-8f0c-4288-bd73-fc50238a6030",
> >   "version": "1.5.2",
> >   "name": "hpa_reporting2_cube",
> >   "description": "",
> >   "dimensions": [
> >     {
> >       "name": "DEFAULT.HPA_REPORTING2.REPORT_DATE",
> >       "table": "DEFAULT.HPA_REPORTING2",
> >       "column": "REPORT_DATE",
> >       "derived": null
> >     },
> >     {
> >       "name": "DEFAULT.HPA_REPORTING2.SEARCH_TYPE",
> >       "table": "DEFAULT.HPA_REPORTING2",
> >       "column": "SEARCH_TYPE",
> >       "derived": null
> >     },
> >     {
> >       "name": "DEFAULT.HPA_REPORTING2.HOTEL_ID",
> >       "table": "DEFAULT.HPA_REPORTING2",
> >       "column": "HOTEL_ID",
> >       "derived": null
> >     },
> >     {
> >       "name": "DEFAULT.HPA_REPORTING2.CHILD_ID",
> >       "table": "DEFAULT.HPA_REPORTING2",
> >       "column": "CHILD_ID",
> >       "derived": null
> >     },
> >     {
> >       "name": "DEFAULT.HPA_REPORTING2.COUNTRY",
> >       "table": "DEFAULT.HPA_REPORTING2",
> >       "column": "COUNTRY",
> >       "derived": null
> >     },
> >     {
> >       "name": "DEFAULT.HPA_REPORTING2.DEVICE_TYPE",
> >       "table": "DEFAULT.HPA_REPORTING2",
> >       "column": "DEVICE_TYPE",
> >       "derived": null
> >     },
> >     {
> >       "name": "DEFAULT.HPA_REPORTING2.STAY_LENGTH",
> >       "table": "DEFAULT.HPA_REPORTING2",
> >       "column": "STAY_LENGTH",
> >       "derived": null
> >     },
> >     {
> >       "name": "DEFAULT.HPA_REPORTING2.TRUE_RANK_AG",
> >       "table": "DEFAULT.HPA_REPORTING2",
> >       "column": "TRUE_RANK_AG",
> >       "derived": null
> >     },
> >     {
> >       "name": "DEFAULT.HPA_REPORTING2.ROOM_BUNDLE",
> >       "table": "DEFAULT.HPA_REPORTING2",
> >       "column": "ROOM_BUNDLE",
> >       "derived": null
> >     },
> >     {
> >       "name": "DEFAULT.HPA_REPORTING2.SITE_ID",
> >       "table": "DEFAULT.HPA_REPORTING2",
> >       "column": "SITE_ID",
> >       "derived": null
> >     }
> >   ],
> >   "measures": [
> >     {
> >       "name": "_COUNT_",
> >       "function": {
> >         "expression": "COUNT",
> >         "parameter": {
> >           "type": "constant",
> >           "value": "1",
> >           "next_parameter": null
> >         },
> >         "returntype": "bigint"
> >       },
> >       "dependent_measure_ref": null
> >     },
> >     {
> >       "name": "CLICKS",
> >       "function": {
> >         "expression": "SUM",
> >         "parameter": {
> >           "type": "column",
> >           "value": "CLICKS",
> >           "next_parameter": null
> >         },
> >         "returntype": "decimal"
> >       },
> >       "dependent_measure_ref": null
> >     }
> >   ],
> >   "rowkey": {
> >     "rowkey_columns": [
> >       {
> >         "column": "REPORT_DATE",
> >         "encoding": "dict",
> >         "isShardBy": false
> >       },
> >       {
> >         "column": "SEARCH_TYPE",
> >         "encoding": "dict",
> >         "isShardBy": false
> >       },
> >       {
> >         "column": "HOTEL_ID",
> >         "encoding": "dict",
> >         "isShardBy": false
> >       },
> >       {
> >         "column": "CHILD_ID",
> >         "encoding": "dict",
> >         "isShardBy": false
> >       },
> >       {
> >         "column": "COUNTRY",
> >         "encoding": "dict",
> >         "isShardBy": false
> >       },
> >       {
> >         "column": "DEVICE_TYPE",
> >         "encoding": "dict",
> >         "isShardBy": false
> >       },
> >       {
> >         "column": "STAY_LENGTH",
> >         "encoding": "dict",
> >         "isShardBy": false
> >       },
> >       {
> >         "column": "TRUE_RANK_AG",
> >         "encoding": "dict",
> >         "isShardBy": false
> >       },
> >       {
> >         "column": "ROOM_BUNDLE",
> >         "encoding": "dict",
> >         "isShardBy": false
> >       },
> >       {
> >         "column": "SITE_ID",
> >         "encoding": "dict",
> >         "isShardBy": false
> >       }
> >     ]
> >   },
> >   "signature": "KixlKWxevr6mO+UlSaR5ig==",
> >   "last_modified": 1470070273935,
> >   "model_name": "hpa_reporting_model2",
> >   "null_string": null,
> >   "hbase_mapping": {
> >     "column_family": [
> >       {
> >         "name": "F1",
> >         "columns": [
> >           {
> >             "qualifier": "M",
> >             "measure_refs": [
> >               "_COUNT_",
> >               "CLICKS"
> >             ]
> >           }
> >         ]
> >       }
> >     ]
> >   },
> >   "aggregation_groups": [
> >     {
> >       "includes": [
> >         "REPORT_DATE",
> >         "SEARCH_TYPE",
> >         "HOTEL_ID",
> >         "CHILD_ID",
> >         "COUNTRY",
> >         "DEVICE_TYPE",
> >         "STAY_LENGTH",
> >         "TRUE_RANK_AG",
> >         "ROOM_BUNDLE",
> >         "SITE_ID"
> >       ],
> >       "select_rule": {
> >         "hierarchy_dims": [],
> >         "mandatory_dims": [
> >           "SITE_ID",
> >           "CHILD_ID"
> >         ],
> >         "joint_dims": [
> >           [
> >             "ROOM_BUNDLE",
> >             "TRUE_RANK_AG"
> >           ]
> >         ]
> >       }
> >     }
> >   ],
> >   "notify_list": [],
> >   "status_need_notify": [
> >     "ERROR",
> >     "DISCARDED",
> >     "SUCCEED"
> >   ],
> >   "partition_date_start": 0,
> >   "partition_date_end": 3153600000000,
> >   "auto_merge_time_ranges": [
> >     604800000,
> >     2419200000
> >   ],
> >   "retention_range": 0,
> >   "engine_type": 2,
> >   "storage_type": 2,
> >   "override_kylin_properties": {}
> > }
> >
> > On Mon, Aug 1, 2016 at 8:02 PM, ShaoFeng Shi <[email protected]>
> > wrote:
> >
> > > Hi Jason,
> > >
> > > As Yiming mentioned, the cube design matters for the performance of
> both
> > > build and query; please check "Optimize Cube" in the document web page
> > and
> > > do optimizaiton as much as possible;
> > >
> > > Besides, the cluster's capacity and Hadoop configuration is also an
> > > important factor; Try to identify the bottleneck and then optimize or
> add
> > > capacity.
> > >
> > > From 1.5 Kylin ships with two cubing algorithm; the steps "Build
> > > N-Dimension Cuboid" are the legacy "Layered" cubing algorithm; They
> will
> > be
> > > skipped when Kylin selects to use the new "Fast" cubing algorithm,
> which
> > is
> > > the "Build Cube" step after them. Please click the hadoop link in that
> > step
> > > to inspect the MR job's statistics;
> > >
> > > Hope this helps to some extend;
> > >
> > >
> > >
> > > 2016-08-02 8:44 GMT+08:00 Yiming Liu <[email protected]>:
> > >
> > > > Hi Jason,
> > > >
> > > > Cube design is the performance key for Kylin, not only query, but
> also
> > > cube
> > > > building process. How to select dimensions, how to define the
> > > relationship
> > > > between dimensions, how to select encode method, how to define
> measure,
> > > > even how to choose the Hbase key order will have a significant impact
> > on
> > > > performance.  There are quite a few wonderful documents introducing
> how
> > > to
> > > > do this, http://kylin.apache.org/docs15/ .
> > > >
> > > > One more thing, if you could share your cube design, you would get
> help
> > > > easier here.
> > > >
> > > > 2016-08-02 7:20 GMT+08:00 Jason Hale <[email protected]>:
> > > >
> > > > > I'm setting up a test case for a portion of our dataset, to
> evaluate
> > > > Kylin
> > > > > and I'm not seeing the performance that I would expect.
> > > > >
> > > > > The cube building process is taking about 5-6 hours with
> ~69,000,000
> > > > > records and 10 dimensions. I'm not sure if that's the expected
> build
> > > > time,
> > > > > but the other problem is the query performance after building the
> > cube.
> > > > >
> > > > > All queries were tested with a very simple query (e.g. SELECT
> > > SUM(clicks)
> > > > > FROM reporting GROUP BY search_type)
> > > > >
> > > > > Grouping by 1 or 2 dimensions gives me very responsive queries
> > (under 2
> > > > > seconds), but adding more dimensions drastically increases the
> query
> > > time
> > > > > (over 1 minute and it times out through hbase). I would expect
> these
> > > > > queries to have all similar query times since they should query the
> > > > > respective cuboid, so I'm not sure why the performance would
> suffer.
> > I
> > > > > didn't set up any special rules for the cube, but during the build
> > > > process
> > > > > it showed all the N-dimension cubes and the log simply said
> > 'skipped'.
> > > > >
> > > > > Is there something I'm missing in the configuration?
> > > > >
> > > > > I have a HDP cluster with 3 nodes and 1 client node on which Kylin
> is
> > > > > installed. Do I need to adjust the hadoop configuration. I'm using
> > most
> > > > of
> > > > > the default HDP settings.
> > > > >
> > > > > What more information can I provide?
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > With Warm regards
> > > >
> > > > Yiming Liu (刘一鸣)
> > > >
> > >
> > >
> > >
> > > --
> > > Best regards,
> > >
> > > Shaofeng Shi
> > >
> >
>
>
>
> --
> Best regards,
>
> Shaofeng Shi
>

Reply via email to