[
https://issues.apache.org/jira/browse/KYLIN-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17136593#comment-17136593
]
ASF GitHub Bot commented on KYLIN-4343:
---------------------------------------
hit-lacus edited a comment on pull request #1267:
URL: https://github.com/apache/kylin/pull/1267#issuecomment-644727013
### Create table
```sh
zookeeper lock path :/mr_dict_ephemeral_lock/UserActionCubeByHive_NO2,
result is false
zookeeper get lock costTime : 0 s
Build Hive Global Dictionary by: hive -e "set mapreduce.job.name=Build Hive
Global Dict - extract distinct value;
USE LACUS;
set hive.exec.compress.output=false;set hive.mapred.mode=unstrict;CREATE
TABLE IF NOT EXISTS LACUS.UserActionCubeByHive_NO2_global_dict
( dict_key STRING COMMENT '',
dict_val INT COMMENT ''
)
COMMENT 'Hive Global Dictionary'
PARTITIONED BY (dict_column string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE;
DROP TABLE IF EXISTS
kylin_intermediate_useractioncubebyhive_no2_e99a9c08_3437_06d8_796f_807dd224a970__distinct_value;
CREATE TABLE IF NOT EXISTS
kylin_intermediate_useractioncubebyhive_no2_e99a9c08_3437_06d8_796f_807dd224a970__distinct_value
(
dict_key STRING COMMENT ''
)
COMMENT ''
PARTITIONED BY (dict_column string)
STORED AS TEXTFILE
;
DROP TABLE IF EXISTS
kylin_intermediate_useractioncubebyhive_no2_e99a9c08_3437_06d8_796f_807dd224a970_global_dict;
CREATE TABLE IF NOT EXISTS
kylin_intermediate_useractioncubebyhive_no2_e99a9c08_3437_06d8_796f_807dd224a970_global_dict
(
dict_key STRING COMMENT '' ,
dict_val STRING COMMENT ''
)
COMMENT ''
PARTITIONED BY (dict_column string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE
;
INSERT OVERWRITE TABLE
kylin_intermediate_useractioncubebyhive_no2_e99a9c08_3437_06d8_796f_807dd224a970__distinct_value
PARTITION (dict_column = 'USERACTIONLOGSAMPLE_PLAY_ID')
SELECT a.DICT_KEY FROM (
SELECT
USERACTIONLOGSAMPLE_PLAY_ID as DICT_KEY
FROM
kylin_intermediate_useractioncubebyhive_no2_e99a9c08_3437_06d8_796f_807dd224a970
GROUP BY USERACTIONLOGSAMPLE_PLAY_ID) a
LEFT JOIN
(SELECT DICT_KEY FROM LACUS.UserActionCubeByHive_NO2_global_dict WHERE
DICT_COLUMN = 'USERACTIONLOGSAMPLE_PLAY_ID' ) b
ON a.DICT_KEY = b.DICT_KEY
WHERE b.DICT_KEY IS NULL
;
INSERT OVERWRITE TABLE
kylin_intermediate_useractioncubebyhive_no2_e99a9c08_3437_06d8_796f_807dd224a970__distinct_value
PARTITION (dict_column = 'USERACTIONLOGSAMPLE_PLAY_DURATION')
SELECT a.DICT_KEY FROM (
SELECT
USERACTIONLOGSAMPLE_PLAY_DURATION as DICT_KEY
FROM
kylin_intermediate_useractioncubebyhive_no2_e99a9c08_3437_06d8_796f_807dd224a970
GROUP BY USERACTIONLOGSAMPLE_PLAY_DURATION) a
LEFT JOIN
(SELECT DICT_KEY FROM LACUS.UserActionCubeByHive_NO2_global_dict WHERE
DICT_COLUMN = 'USERACTIONLOGSAMPLE_PLAY_DURATION' ) b
ON a.DICT_KEY = b.DICT_KEY
WHERE b.DICT_KEY IS NULL
;
INSERT OVERWRITE TABLE
kylin_intermediate_useractioncubebyhive_no2_e99a9c08_3437_06d8_796f_807dd224a970__distinct_value
PARTITION (DICT_COLUMN = 'KYLIN_MAX_DISTINCT_COUNT')
SELECT CONCAT_WS(',', tc.dict_column, cast(tc.total_distinct_val AS String),
if(tm.max_dict_val is null, '0', cast(max_dict_val as string)))
FROM (
SELECT dict_column, count(1) total_distinct_val
FROM
LACUS.kylin_intermediate_useractioncubebyhive_no2_e99a9c08_3437_06d8_796f_807dd224a970__distinct_value
WHERE DICT_COLUMN != 'KYLIN_MAX_DISTINCT_COUNT'
GROUP BY dict_column) tc
LEFT JOIN (
SELECT dict_column, if(max(dict_val) is null, 0, max(dict_val)) as
max_dict_val
FROM LACUS.UserActionCubeByHive_NO2_global_dict
GROUP BY dict_column) tm
ON tc.dict_column = tm.dict_column;
" --hiveconf hive.merge.mapredfiles=false --hiveconf
hive.auto.convert.join=true --hiveconf dfs.replication=2 --hiveconf
hive.exec.compress.output=true --hiveconf
hive.auto.convert.join.noconditionaltask=true --hiveconf
mapreduce.job.split.metainfo.maxsize=-1 --hiveconf hive.merge.mapfiles=false
--hiveconf hive.auto.convert.join.noconditionaltask.size=100000000 --hiveconf
hive.stats.autogather=true
ls: cannot access
/root/lib/spark-2.3.3-bin-hadoop2.6/lib/spark-assembly-*.jar: No such file or
directory
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M;
support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: Using incremental CMS is
deprecated and will likely be removed in a future release
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M;
support was removed in 8.0
Logging initialized using configuration in
jar:file:/opt/cloudera/parcels/CDH-5.7.6-1.cdh5.7.6.p0.6/jars/hive-common-1.1.0-cdh5.7.6.jar!/hive-log4j.properties
OK
Time taken: 1.997 seconds
OK
Time taken: 0.45 seconds
OK
Time taken: 0.084 seconds
OK
Time taken: 0.165 seconds
OK
Time taken: 0.056 seconds
OK
Time taken: 0.175 seconds
Query ID = root_20200616195151_5b911606-e6ed-4ff5-80a8-bc3091f6064b
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1589169585068_5803, Tracking URL =
http://cdh-master:8088/proxy/application_1589169585068_5803/
Kill Command =
/opt/cloudera/parcels/CDH-5.7.6-1.cdh5.7.6.p0.6/bin/../lib/hadoop/bin/hadoop
job -kill job_1589169585068_5803
Hadoop job information for Stage-1: number of mappers: 1; number of
reducers: 1
2020-06-16 19:51:38,155 Stage-1 map = 0%, reduce = 0%
2020-06-16 19:51:43,328 Stage-1 map = 100%, reduce = 0%, Cumulative CPU
2.23 sec
2020-06-16 19:51:49,505 Stage-1 map = 100%, reduce = 100%, Cumulative CPU
4.77 sec
MapReduce Total cumulative CPU time: 4 seconds 770 msec
Ended Job = job_1589169585068_5803
Stage-7 is selected by condition resolver.
Stage-2 is filtered out by condition resolver.
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M;
support was removed in 8.0
Execution log at:
/tmp/root/root_20200616195151_5b911606-e6ed-4ff5-80a8-bc3091f6064b.log
2020-06-16 07:51:53 Starting to launch local task to process map join;
maximum memory = 1908932608
2020-06-16 07:51:53 Dump the side-table for tag: 1 with group count: 0 into
file:
file:/tmp/root/f4ddf013-1d00-4e3e-9b46-c4ffbfa79e0e/hive_2020-06-16_19-51-30_297_4198334643994193254-1/-local-10003/HashTable-Stage-5/MapJoin-mapfile01--.hashtable
2020-06-16 07:51:53 Uploaded 1 File to:
file:/tmp/root/f4ddf013-1d00-4e3e-9b46-c4ffbfa79e0e/hive_2020-06-16_19-51-30_297_4198334643994193254-1/-local-10003/HashTable-Stage-5/MapJoin-mapfile01--.hashtable
(260 bytes)
2020-06-16 07:51:53 End of local task; Time Taken: 0.432 sec.
Execution completed successfully
MapredLocal task succeeded
Launching Job 3 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1589169585068_5804, Tracking URL =
http://cdh-master:8088/proxy/application_1589169585068_5804/
Kill Command =
/opt/cloudera/parcels/CDH-5.7.6-1.cdh5.7.6.p0.6/bin/../lib/hadoop/bin/hadoop
job -kill job_1589169585068_5804
Hadoop job information for Stage-5: number of mappers: 1; number of
reducers: 0
2020-06-16 19:51:59,633 Stage-5 map = 0%, reduce = 0%
2020-06-16 19:52:04,794 Stage-5 map = 100%, reduce = 0%, Cumulative CPU
3.58 sec
MapReduce Total cumulative CPU time: 3 seconds 580 msec
Ended Job = job_1589169585068_5804
Loading data to table
lacus.kylin_intermediate_useractioncubebyhive_no2_e99a9c08_3437_06d8_796f_807dd224a970__distinct_value
partition (dict_column=USERACTIONLOGSAMPLE_PLAY_ID)
Partition
lacus.kylin_intermediate_useractioncubebyhive_no2_e99a9c08_3437_06d8_796f_807dd224a970__distinct_value{dict_column=USERACTIONLOGSAMPLE_PLAY_ID}
stats: [numFiles=1, numRows=10000, totalSize=527979, rawDataSize=517979]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 4.77 sec HDFS Read:
502687 HDFS Write: 704955 SUCCESS
Stage-Stage-5: Map: 1 Cumulative CPU: 3.58 sec HDFS Read: 710940 HDFS
Write: 528186 SUCCESS
Total MapReduce CPU Time Spent: 8 seconds 350 msec
OK
Time taken: 37.21 seconds
Query ID = root_20200616195252_edb3c5ad-fa39-445c-a214-dc68d566e30e
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1589169585068_5805, Tracking URL =
http://cdh-master:8088/proxy/application_1589169585068_5805/
Kill Command =
/opt/cloudera/parcels/CDH-5.7.6-1.cdh5.7.6.p0.6/bin/../lib/hadoop/bin/hadoop
job -kill job_1589169585068_5805
Hadoop job information for Stage-1: number of mappers: 1; number of
reducers: 1
2020-06-16 19:52:13,708 Stage-1 map = 0%, reduce = 0%
2020-06-16 19:52:18,853 Stage-1 map = 100%, reduce = 0%, Cumulative CPU
2.45 sec
2020-06-16 19:52:25,018 Stage-1 map = 100%, reduce = 100%, Cumulative CPU
5.24 sec
MapReduce Total cumulative CPU time: 5 seconds 240 msec
Ended Job = job_1589169585068_5805
Stage-7 is selected by condition resolver.
Stage-2 is filtered out by condition resolver.
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M;
support was removed in 8.0
Execution log at:
/tmp/root/root_20200616195252_edb3c5ad-fa39-445c-a214-dc68d566e30e.log
2020-06-16 07:52:28 Starting to launch local task to process map join;
maximum memory = 1908932608
2020-06-16 07:52:29 Dump the side-table for tag: 1 with group count: 0 into
file:
file:/tmp/root/f4ddf013-1d00-4e3e-9b46-c4ffbfa79e0e/hive_2020-06-16_19-52-07_533_3381191652948885785-1/-local-10003/HashTable-Stage-5/MapJoin-mapfile11--.hashtable
2020-06-16 07:52:29 Uploaded 1 File to:
file:/tmp/root/f4ddf013-1d00-4e3e-9b46-c4ffbfa79e0e/hive_2020-06-16_19-52-07_533_3381191652948885785-1/-local-10003/HashTable-Stage-5/MapJoin-mapfile11--.hashtable
(260 bytes)
2020-06-16 07:52:29 End of local task; Time Taken: 0.574 sec.
Execution completed successfully
MapredLocal task succeeded
Launching Job 3 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1589169585068_5806, Tracking URL =
http://cdh-master:8088/proxy/application_1589169585068_5806/
Kill Command =
/opt/cloudera/parcels/CDH-5.7.6-1.cdh5.7.6.p0.6/bin/../lib/hadoop/bin/hadoop
job -kill job_1589169585068_5806
Hadoop job information for Stage-5: number of mappers: 1; number of
reducers: 0
2020-06-16 19:52:35,441 Stage-5 map = 0%, reduce = 0%
2020-06-16 19:52:41,604 Stage-5 map = 100%, reduce = 0%, Cumulative CPU
3.71 sec
MapReduce Total cumulative CPU time: 3 seconds 710 msec
Ended Job = job_1589169585068_5806
Loading data to table
lacus.kylin_intermediate_useractioncubebyhive_no2_e99a9c08_3437_06d8_796f_807dd224a970__distinct_value
partition (dict_column=USERACTIONLOGSAMPLE_PLAY_DURATION)
Partition
lacus.kylin_intermediate_useractioncubebyhive_no2_e99a9c08_3437_06d8_796f_807dd224a970__distinct_value{dict_column=USERACTIONLOGSAMPLE_PLAY_DURATION}
stats: [numFiles=1, numRows=4098, totalSize=29177, rawDataSize=25079]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 5.24 sec HDFS Read:
502807 HDFS Write: 89619 SUCCESS
Stage-Stage-5: Map: 1 Cumulative CPU: 3.71 sec HDFS Read: 95878 HDFS
Write: 29388 SUCCESS
Total MapReduce CPU Time Spent: 8 seconds 950 msec
OK
Time taken: 35.619 seconds
Query ID = root_20200616195252_5869795b-713e-4066-b062-48c0ff08d4a1
Total jobs = 4
Launching Job 1 out of 4
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1589169585068_5807, Tracking URL =
http://cdh-master:8088/proxy/application_1589169585068_5807/
Kill Command =
/opt/cloudera/parcels/CDH-5.7.6-1.cdh5.7.6.p0.6/bin/../lib/hadoop/bin/hadoop
job -kill job_1589169585068_5807
Hadoop job information for Stage-1: number of mappers: 1; number of
reducers: 1
2020-06-16 19:52:49,232 Stage-1 map = 0%, reduce = 0%
2020-06-16 19:52:54,366 Stage-1 map = 100%, reduce = 0%, Cumulative CPU
1.66 sec
2020-06-16 19:53:00,516 Stage-1 map = 100%, reduce = 100%, Cumulative CPU
3.71 sec
MapReduce Total cumulative CPU time: 3 seconds 710 msec
Ended Job = job_1589169585068_5807
Launching Job 2 out of 4
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1589169585068_5808, Tracking URL =
http://cdh-master:8088/proxy/application_1589169585068_5808/
Kill Command =
/opt/cloudera/parcels/CDH-5.7.6-1.cdh5.7.6.p0.6/bin/../lib/hadoop/bin/hadoop
job -kill job_1589169585068_5808
Hadoop job information for Stage-4: number of mappers: 1; number of
reducers: 1
2020-06-16 19:53:07,019 Stage-4 map = 0%, reduce = 0%
2020-06-16 19:53:12,151 Stage-4 map = 100%, reduce = 0%, Cumulative CPU
1.32 sec
2020-06-16 19:53:18,296 Stage-4 map = 100%, reduce = 100%, Cumulative CPU
4.69 sec
MapReduce Total cumulative CPU time: 4 seconds 690 msec
Ended Job = job_1589169585068_5808
Stage-7 is selected by condition resolver.
Stage-2 is filtered out by condition resolver.
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M;
support was removed in 8.0
Execution log at:
/tmp/root/root_20200616195252_5869795b-713e-4066-b062-48c0ff08d4a1.log
2020-06-16 07:53:21 Starting to launch local task to process map join;
maximum memory = 1908932608
2020-06-16 07:53:22 Dump the side-table for tag: 1 with group count: 0 into
file:
file:/tmp/root/f4ddf013-1d00-4e3e-9b46-c4ffbfa79e0e/hive_2020-06-16_19-52-43_179_1208261470870565365-1/-local-10004/HashTable-Stage-5/MapJoin-mapfile21--.hashtable
2020-06-16 07:53:22 Uploaded 1 File to:
file:/tmp/root/f4ddf013-1d00-4e3e-9b46-c4ffbfa79e0e/hive_2020-06-16_19-52-43_179_1208261470870565365-1/-local-10004/HashTable-Stage-5/MapJoin-mapfile21--.hashtable
(260 bytes)
2020-06-16 07:53:22 End of local task; Time Taken: 0.704 sec.
Execution completed successfully
MapredLocal task succeeded
Launching Job 4 out of 4
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1589169585068_5809, Tracking URL =
http://cdh-master:8088/proxy/application_1589169585068_5809/
Kill Command =
/opt/cloudera/parcels/CDH-5.7.6-1.cdh5.7.6.p0.6/bin/../lib/hadoop/bin/hadoop
job -kill job_1589169585068_5809
Hadoop job information for Stage-5: number of mappers: 1; number of
reducers: 0
2020-06-16 19:53:28,514 Stage-5 map = 0%, reduce = 0%
2020-06-16 19:53:34,689 Stage-5 map = 100%, reduce = 0%, Cumulative CPU
3.19 sec
MapReduce Total cumulative CPU time: 3 seconds 190 msec
Ended Job = job_1589169585068_5809
Loading data to table
lacus.kylin_intermediate_useractioncubebyhive_no2_e99a9c08_3437_06d8_796f_807dd224a970__distinct_value
partition (dict_column=KYLIN_MAX_DISTINCT_COUNT)
Partition
lacus.kylin_intermediate_useractioncubebyhive_no2_e99a9c08_3437_06d8_796f_807dd224a970__distinct_value{dict_column=KYLIN_MAX_DISTINCT_COUNT}
stats: [numFiles=1, numRows=2, totalSize=77, rawDataSize=75]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 3.71 sec HDFS Read:
565472 HDFS Write: 198 SUCCESS
Stage-Stage-4: Map: 1 Reduce: 1 Cumulative CPU: 4.69 sec HDFS Read:
8598 HDFS Write: 96 SUCCESS
Stage-Stage-5: Map: 1 Cumulative CPU: 3.19 sec HDFS Read: 6382 HDFS
Write: 274 SUCCESS
Total MapReduce CPU Time Spent: 11 seconds 590 msec
OK
Time taken: 53.08 seconds
```
### Clean up
```shell
Hive table
LACUS.kylin_intermediate_useractioncubebyhive_no2_e99a9c08_3437_06d8_796f_807dd224a970
is dropped.
Hive table
LACUS.kylin_intermediate_useractioncubebyhive_no2_e99a9c08_3437_06d8_796f_807dd224a970__distinct_value
is dropped.
Hive table
LACUS.kylin_intermediate_useractioncubebyhive_no2_e99a9c08_3437_06d8_796f_807dd224a970_global_dict
is dropped.
Path
[hdfs://cdh-master:8020/LACUS/LACUS/kylin-23742cdf-63f9-bfb0-a446-201795163dd1/kylin_intermediate_useractioncubebyhive_no2_e99a9c08_3437_06d8_796f_807dd224a970]
is deleted.
```
<img width="1130" alt="image"
src="https://user-images.githubusercontent.com/14030549/84773097-7fe58380-b00e-11ea-914c-45164c43e0da.png">
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Build Global Dict by MR/Hive, new config
> ----------------------------------------
>
> Key: KYLIN-4343
> URL: https://issues.apache.org/jira/browse/KYLIN-4343
> Project: Kylin
> Issue Type: Sub-task
> Reporter: wangxiaojing
> Assignee: wangxiaojing
> Priority: Major
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)