I have only two dimensions and two measures. The intermediate table was
quickly created. Then it stuck at base_cuboid_builder. I saw two map
tasks running:
hdfs://hdfs-batch-layer/user/kylin/kylin_metadata/kylin-d9be0027-a0f5-4a99-bca1-f18841e3c29e/kylin_intermediate_logout_full_cube_19700101000000_20160626000000/000000_0:0+33554432
> sort
hdfs://hdfs-batch-layer/user/kylin/kylin_metadata/kylin-d9be0027-a0f5-4a99-bca1-f18841e3c29e/kylin_intermediate_logout_full_cube_19700101000000_20160626000000/000000_0:33554432+25438799
> sort
with 100% CPU utilization.
The running keeps one hours and then was killed due to timeout. I do not
see any exceptions in yarn/Kylin logs. The same cube was built with
samller data. In the log I see that the memory was not fully used. I
just believe that the map taks take long, but based on your experience
something shall be not OK. Where I should have a deeper insight? The
following is my cube. Both dimensions are small:
{
"uuid": "4974bc14-b792-4fd6-910d-598e6a16bb6d",
"version": "1.5.2",
"name": "logout_full_cube",
"description": "",
"dimensions": [
{
"name": "RAWDATA.USER_LOGOUT_FULL.GAME",
"table": "RAWDATA.USER_LOGOUT_FULL",
"column": "GAME",
"derived": null
},
{
"name": "RAWDATA.USER_LOGOUT_FULL.LANG",
"table": "RAWDATA.USER_LOGOUT_FULL",
"column": "LANG",
"derived": null
}
],
"measures": [
{
"name": "_COUNT_",
"function": {
"expression": "COUNT",
"parameter": {
"type": "constant",
"value": "1",
"next_parameter": null
},
"returntype": "bigint"
},
"dependent_measure_ref": null
},
{
"name": "TOTAL_PLAYTIME",
"function": {
"expression": "SUM",
"parameter": {
"type": "column",
"value": "PLAYTIME",
"next_parameter": null
},
"returntype": "bigint"
},
"dependent_measure_ref": null
},
{
"name": "TOP_PLAYER",
"function": {
"expression": "TOP_N",
"parameter": {
"type": "column",
"value": "PLAYTIME",
"next_parameter": {
"type": "column",
"value": "USER_ID",
"next_parameter": null
}
},
"returntype": "topn(100)"
},
"dependent_measure_ref": null
}
],
"rowkey": {
"rowkey_columns": [
{
"column": "GAME",
"encoding": "dict",
"isShardBy": false
},
{
"column": "LANG",
"encoding": "dict",
"isShardBy": false
}
]
},
"signature": "jM6BX7iZE3oHEN+aw0tXaw==",
"last_modified": 1465892097992,
"model_name": "logout_full",
"null_string": null,
"hbase_mapping": {
"column_family": [
{
"name": "F1",
"columns": [
{
"qualifier": "M",
"measure_refs": [
"_COUNT_",
"TOTAL_PLAYTIME",
"TOP_PLAYER"
]
}
]
}
]
},
"aggregation_groups": [
{
"includes": [
"GAME",
"LANG"
],
"select_rule": {
"hierarchy_dims": [],
"mandatory_dims": [],
"joint_dims": []
}
}
],
"notify_list": [],
"status_need_notify": [
"ERROR",
"DISCARDED",
"SUCCEED"
],
"partition_date_start": 0,
"partition_date_end": 3153600000000,
"auto_merge_time_ranges": [
604800000,
2419200000
],
"retention_range": 0,
"engine_type": 2,
"storage_type": 2,
"override_kylin_properties": {}
}
Am 15.06.2016 um 15:40 schrieb hongbin ma:
kylin_job_conf.xm
l will be
used for overwriting confs for kylin job.
Instead of seeking to disable container timeout I suggest to figure out
why it's taking so long. a fact table with 3G in size is not that large,
the mapper should not take that long. how many dimensions do you have? can
you share your cube desc?