I have only two dimensions and two measures. The intermediate table was quickly created. Then it stuck at base_cuboid_builder. I saw two map tasks running:

hdfs://hdfs-batch-layer/user/kylin/kylin_metadata/kylin-d9be0027-a0f5-4a99-bca1-f18841e3c29e/kylin_intermediate_logout_full_cube_19700101000000_20160626000000/000000_0:0+33554432 > sort

hdfs://hdfs-batch-layer/user/kylin/kylin_metadata/kylin-d9be0027-a0f5-4a99-bca1-f18841e3c29e/kylin_intermediate_logout_full_cube_19700101000000_20160626000000/000000_0:33554432+25438799 > sort

with 100% CPU utilization.

The running keeps one hours and then was killed due to timeout. I do not see any exceptions in yarn/Kylin logs. The same cube was built with samller data. In the log I see that the memory was not fully used. I just believe that the map taks take long, but based on your experience something shall be not OK. Where I should have a deeper insight? The following is my cube. Both dimensions are small:

{
  "uuid": "4974bc14-b792-4fd6-910d-598e6a16bb6d",
  "version": "1.5.2",
  "name": "logout_full_cube",
  "description": "",
  "dimensions": [
    {
      "name": "RAWDATA.USER_LOGOUT_FULL.GAME",
      "table": "RAWDATA.USER_LOGOUT_FULL",
      "column": "GAME",
      "derived": null
    },
    {
      "name": "RAWDATA.USER_LOGOUT_FULL.LANG",
      "table": "RAWDATA.USER_LOGOUT_FULL",
      "column": "LANG",
      "derived": null
    }
  ],
  "measures": [
    {
      "name": "_COUNT_",
      "function": {
        "expression": "COUNT",
        "parameter": {
          "type": "constant",
          "value": "1",
          "next_parameter": null
        },
        "returntype": "bigint"
      },
      "dependent_measure_ref": null
    },
    {
      "name": "TOTAL_PLAYTIME",
      "function": {
        "expression": "SUM",
        "parameter": {
          "type": "column",
          "value": "PLAYTIME",
          "next_parameter": null
        },
        "returntype": "bigint"
      },
      "dependent_measure_ref": null
    },
    {
      "name": "TOP_PLAYER",
      "function": {
        "expression": "TOP_N",
        "parameter": {
          "type": "column",
          "value": "PLAYTIME",
          "next_parameter": {
            "type": "column",
            "value": "USER_ID",
            "next_parameter": null
          }
        },
        "returntype": "topn(100)"
      },
      "dependent_measure_ref": null
    }
  ],
  "rowkey": {
    "rowkey_columns": [
      {
        "column": "GAME",
        "encoding": "dict",
        "isShardBy": false
      },
      {
        "column": "LANG",
        "encoding": "dict",
        "isShardBy": false
      }
    ]
  },
  "signature": "jM6BX7iZE3oHEN+aw0tXaw==",
  "last_modified": 1465892097992,
  "model_name": "logout_full",
  "null_string": null,
  "hbase_mapping": {
    "column_family": [
      {
        "name": "F1",
        "columns": [
          {
            "qualifier": "M",
            "measure_refs": [
              "_COUNT_",
              "TOTAL_PLAYTIME",
              "TOP_PLAYER"
            ]
          }
        ]
      }
    ]
  },
  "aggregation_groups": [
    {
      "includes": [
        "GAME",
        "LANG"
      ],
      "select_rule": {
        "hierarchy_dims": [],
        "mandatory_dims": [],
        "joint_dims": []
      }
    }
  ],
  "notify_list": [],
  "status_need_notify": [
    "ERROR",
    "DISCARDED",
    "SUCCEED"
  ],
  "partition_date_start": 0,
  "partition_date_end": 3153600000000,
  "auto_merge_time_ranges": [
    604800000,
    2419200000
  ],
  "retention_range": 0,
  "engine_type": 2,
  "storage_type": 2,
  "override_kylin_properties": {}
}



Am 15.06.2016 um 15:40 schrieb hongbin ma:
kylin_job_conf.xm
​l will be ​
​
​used for overwriting confs for kylin job.

​Instead of seeking to disable container timeout I suggest to figure out
why it's taking so long. a fact table with 3G in size is not that large,
the mapper should not take that long. how many dimensions do you have? can
you share your cube desc?​





Reply via email to