vu thanh dat created KYLIN-3123: ----------------------------------- Summary: Improve Spark Cubing Key: KYLIN-3123 URL: https://issues.apache.org/jira/browse/KYLIN-3123 Project: Kylin Issue Type: Improvement Components: Spark Engine Affects Versions: v2.2.0 Environment: HDP , Hbase, Spark 2.6, Centos7 Reporter: vu thanh dat Fix For: v2.2.0 Attachments: dimension.bmp, measures.bmp, rowkeys.bmp, spark_so_slow_2.bmp
Hi all, Im using Spark to bulid Kylin cube. Data is about 13 millions rows for one step. Partition by date, 10 dimension, no measures. I set config: kylin.storage.hbase.compression-codec=snappy kylin.engine.spark.rdd-partition-cut-mb=1000 kylin.engine.spark.max-partition=5000 kylin.engine.spark-conf.spark.master=yarn kylin.engine.spark-conf.spark.submit.deployMode=cluster kylin.engine.spark-conf.spark.dynamicAllocation.enabled=true kylin.engine.spark-conf.spark.dynamicAllocation.minExecutors=100 kylin.engine.spark-conf.spark.dynamicAllocation.maxExecutors=10240 kylin.engine.spark-conf.spark.dynamicAllocation.executorIdleTimeout=300 kylin.engine.spark-conf.spark.shuffle.service.enabled=true kylin.engine.spark-conf.spark.shuffle.service.port=7337 kylin.engine.spark-conf.spark.yarn.queue=default kylin.engine.spark-conf.spark.executor.memory=4G kylin.engine.spark-conf.spark.executor.cores=4 Step Build Cube with Spark so slow, about 1hour for this step, can you show me to custom kylin config for speed up this step. I have 30s servers centos, storage 5.87T and 448 cores. I'm attach my config. Best regards and thanks! -- This message was sent by Atlassian JIRA (v6.4.14#64029)