[jira] [Commented] (KYLIN-4941) Support encoding raw data to base cuboid column-by-column

ASF GitHub Bot (Jira) Tue, 18 May 2021 18:39:05 -0700


    [ 
https://issues.apache.org/jira/browse/KYLIN-4941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17347246#comment-17347246
 ]


ASF GitHub Bot commented on KYLIN-4941:
---------------------------------------

zhangayqian commented on a change in pull request #1631:
URL: https://github.com/apache/kylin/pull/1631#discussion_r633984742



##########
File path: 
engine-spark/src/main/java/org/apache/kylin/engine/spark/SparkCubingByLayer.java
##########
@@ -330,15 +409,12 @@ public EncodeBaseCuboid(String cubeName, String 
segmentId, String metaurl, Seria
                             long baseCuboidId = 
Cuboid.getBaseCuboidId(cubeDesc);
                             Cuboid baseCuboid = 
Cuboid.findForMandatory(cubeDesc, baseCuboidId);
                             String splitKey = 
String.valueOf(TaskContext.getPartitionId());
-                            try {

Review comment:
       Why remove this try catch?

##########
File path: 
core-common/src/main/java/org/apache/kylin/common/KylinConfigBase.java
##########
@@ -2709,7 +2709,11 @@ public int getDistCPMaxMapNum(){
         return Integer.valueOf(getOptional("kylin.storage.distcp-max-map-num", 
"50"));
     }
 
-    public String getKylinDictCacheStrength(){
+    public String getKylinDictCacheStrength() {
         return getOptional("kylin.dict.cache.strength", "soft");
-    };
+    }
+
+    public boolean encodeBaseCuboidColumnByColumn() {
+        return 
Boolean.valueOf(getOptional("kylin.job.encode.base.cuboid.column-by-column", 
"false"));

Review comment:
       Is this configuration item more suitable for cube level configuration?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


> Support encoding raw data to base cuboid column-by-column
> ---------------------------------------------------------
>
>                 Key: KYLIN-4941
>                 URL: https://issues.apache.org/jira/browse/KYLIN-4941
>             Project: Kylin
>          Issue Type: Improvement
>          Components: Job Engine
>    Affects Versions: v3.1.1
>            Reporter: ShengJun Zheng
>            Assignee: ShengJun Zheng
>            Priority: Major
>             Fix For: v3.1.3
>
>
> When building with spark engine, the first step is to encode hive table's row 
> to base cuboid data.
> The existing implementation is encoding row by row. If the cube has several 
> dictionary encoded measures, it has to use all dictionaries at the same time 
> to encode a single row. This causes heavy memory usage, and low cache hit 
> ratio of dictionary cache.
> We optimized this case by encoding column by column, and it did bring 
> significant improvement over cubes with several high cardinality 
> dictionaries-encoded measures.
> We will refine the implementation based on KYLIN3.x and share it out.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (KYLIN-4941) Support encoding raw data to base cuboid column-by-column

Reply via email to