Re: Kylin Cube Performance

Jason Hale Tue, 02 Aug 2016 08:09:01 -0700

I've looked over the optimization options before, but did not notice the
rowkey ordering. I can try this and see if this helps me. This is the only
thing I see that I can attempt to optimize further in the design, but I'll
provide my cube design below. I only have one measure to keep it simple:


{
  "uuid": "4090b854-8f0c-4288-bd73-fc50238a6030",
  "version": "1.5.2",
  "name": "hpa_reporting2_cube",
  "description": "",
  "dimensions": [
    {
      "name": "DEFAULT.HPA_REPORTING2.REPORT_DATE",
      "table": "DEFAULT.HPA_REPORTING2",
      "column": "REPORT_DATE",
      "derived": null
    },
    {
      "name": "DEFAULT.HPA_REPORTING2.SEARCH_TYPE",
      "table": "DEFAULT.HPA_REPORTING2",
      "column": "SEARCH_TYPE",
      "derived": null
    },
    {
      "name": "DEFAULT.HPA_REPORTING2.HOTEL_ID",
      "table": "DEFAULT.HPA_REPORTING2",
      "column": "HOTEL_ID",
      "derived": null
    },
    {
      "name": "DEFAULT.HPA_REPORTING2.CHILD_ID",
      "table": "DEFAULT.HPA_REPORTING2",
      "column": "CHILD_ID",
      "derived": null
    },
    {
      "name": "DEFAULT.HPA_REPORTING2.COUNTRY",
      "table": "DEFAULT.HPA_REPORTING2",
      "column": "COUNTRY",
      "derived": null
    },
    {
      "name": "DEFAULT.HPA_REPORTING2.DEVICE_TYPE",
      "table": "DEFAULT.HPA_REPORTING2",
      "column": "DEVICE_TYPE",
      "derived": null
    },
    {
      "name": "DEFAULT.HPA_REPORTING2.STAY_LENGTH",
      "table": "DEFAULT.HPA_REPORTING2",
      "column": "STAY_LENGTH",
      "derived": null
    },
    {
      "name": "DEFAULT.HPA_REPORTING2.TRUE_RANK_AG",
      "table": "DEFAULT.HPA_REPORTING2",
      "column": "TRUE_RANK_AG",
      "derived": null
    },
    {
      "name": "DEFAULT.HPA_REPORTING2.ROOM_BUNDLE",
      "table": "DEFAULT.HPA_REPORTING2",
      "column": "ROOM_BUNDLE",
      "derived": null
    },
    {
      "name": "DEFAULT.HPA_REPORTING2.SITE_ID",
      "table": "DEFAULT.HPA_REPORTING2",
      "column": "SITE_ID",
      "derived": null
    }
  ],
  "measures": [
    {
      "name": "_COUNT_",
      "function": {
        "expression": "COUNT",
        "parameter": {
          "type": "constant",
          "value": "1",
          "next_parameter": null
        },
        "returntype": "bigint"
      },
      "dependent_measure_ref": null
    },
    {
      "name": "CLICKS",
      "function": {
        "expression": "SUM",
        "parameter": {
          "type": "column",
          "value": "CLICKS",
          "next_parameter": null
        },
        "returntype": "decimal"
      },
      "dependent_measure_ref": null
    }
  ],
  "rowkey": {
    "rowkey_columns": [
      {
        "column": "REPORT_DATE",
        "encoding": "dict",
        "isShardBy": false
      },
      {
        "column": "SEARCH_TYPE",
        "encoding": "dict",
        "isShardBy": false
      },
      {
        "column": "HOTEL_ID",
        "encoding": "dict",
        "isShardBy": false
      },
      {
        "column": "CHILD_ID",
        "encoding": "dict",
        "isShardBy": false
      },
      {
        "column": "COUNTRY",
        "encoding": "dict",
        "isShardBy": false
      },
      {
        "column": "DEVICE_TYPE",
        "encoding": "dict",
        "isShardBy": false
      },
      {
        "column": "STAY_LENGTH",
        "encoding": "dict",
        "isShardBy": false
      },
      {
        "column": "TRUE_RANK_AG",
        "encoding": "dict",
        "isShardBy": false
      },
      {
        "column": "ROOM_BUNDLE",
        "encoding": "dict",
        "isShardBy": false
      },
      {
        "column": "SITE_ID",
        "encoding": "dict",
        "isShardBy": false
      }
    ]
  },
  "signature": "KixlKWxevr6mO+UlSaR5ig==",
  "last_modified": 1470070273935,
  "model_name": "hpa_reporting_model2",
  "null_string": null,
  "hbase_mapping": {
    "column_family": [
      {
        "name": "F1",
        "columns": [
          {
            "qualifier": "M",
            "measure_refs": [
              "_COUNT_",
              "CLICKS"
            ]
          }
        ]
      }
    ]
  },
  "aggregation_groups": [
    {
      "includes": [
        "REPORT_DATE",
        "SEARCH_TYPE",
        "HOTEL_ID",
        "CHILD_ID",
        "COUNTRY",
        "DEVICE_TYPE",
        "STAY_LENGTH",
        "TRUE_RANK_AG",
        "ROOM_BUNDLE",
        "SITE_ID"
      ],
      "select_rule": {
        "hierarchy_dims": [],
        "mandatory_dims": [
          "SITE_ID",
          "CHILD_ID"
        ],
        "joint_dims": [
          [
            "ROOM_BUNDLE",
            "TRUE_RANK_AG"
          ]
        ]
      }
    }
  ],
  "notify_list": [],
  "status_need_notify": [
    "ERROR",
    "DISCARDED",
    "SUCCEED"
  ],
  "partition_date_start": 0,
  "partition_date_end": 3153600000000,
  "auto_merge_time_ranges": [
    604800000,
    2419200000
  ],
  "retention_range": 0,
  "engine_type": 2,
  "storage_type": 2,
  "override_kylin_properties": {}
}

On Mon, Aug 1, 2016 at 8:02 PM, ShaoFeng Shi <[email protected]> wrote:

> Hi Jason,
>
> As Yiming mentioned, the cube design matters for the performance of both
> build and query; please check "Optimize Cube" in the document web page and
> do optimizaiton as much as possible;
>
> Besides, the cluster's capacity and Hadoop configuration is also an
> important factor; Try to identify the bottleneck and then optimize or add
> capacity.
>
> From 1.5 Kylin ships with two cubing algorithm; the steps "Build
> N-Dimension Cuboid" are the legacy "Layered" cubing algorithm; They will be
> skipped when Kylin selects to use the new "Fast" cubing algorithm, which is
> the "Build Cube" step after them. Please click the hadoop link in that step
> to inspect the MR job's statistics;
>
> Hope this helps to some extend;
>
>
>
> 2016-08-02 8:44 GMT+08:00 Yiming Liu <[email protected]>:
>
> > Hi Jason,
> >
> > Cube design is the performance key for Kylin, not only query, but also
> cube
> > building process. How to select dimensions, how to define the
> relationship
> > between dimensions, how to select encode method, how to define measure,
> > even how to choose the Hbase key order will have a significant impact on
> > performance.  There are quite a few wonderful documents introducing how
> to
> > do this, http://kylin.apache.org/docs15/ .
> >
> > One more thing, if you could share your cube design, you would get help
> > easier here.
> >
> > 2016-08-02 7:20 GMT+08:00 Jason Hale <[email protected]>:
> >
> > > I'm setting up a test case for a portion of our dataset, to evaluate
> > Kylin
> > > and I'm not seeing the performance that I would expect.
> > >
> > > The cube building process is taking about 5-6 hours with  ~69,000,000
> > > records and 10 dimensions. I'm not sure if that's the expected build
> > time,
> > > but the other problem is the query performance after building the cube.
> > >
> > > All queries were tested with a very simple query (e.g. SELECT
> SUM(clicks)
> > > FROM reporting GROUP BY search_type)
> > >
> > > Grouping by 1 or 2 dimensions gives me very responsive queries (under 2
> > > seconds), but adding more dimensions drastically increases the query
> time
> > > (over 1 minute and it times out through hbase). I would expect these
> > > queries to have all similar query times since they should query the
> > > respective cuboid, so I'm not sure why the performance would suffer. I
> > > didn't set up any special rules for the cube, but during the build
> > process
> > > it showed all the N-dimension cubes and the log simply said 'skipped'.
> > >
> > > Is there something I'm missing in the configuration?
> > >
> > > I have a HDP cluster with 3 nodes and 1 client node on which Kylin is
> > > installed. Do I need to adjust the hadoop configuration. I'm using most
> > of
> > > the default HDP settings.
> > >
> > > What more information can I provide?
> > >
> >
> >
> >
> > --
> > With Warm regards
> >
> > Yiming Liu (刘一鸣)
> >
>
>
>
> --
> Best regards,
>
> Shaofeng Shi
>

Re: Kylin Cube Performance

Reply via email to