[jira] [Commented] (KYLIN-4185) CubeStatsReader estimate wrong cube size

ASF GitHub Bot (Jira) Wed, 22 Apr 2020 00:29:15 -0700


    [ 
https://issues.apache.org/jira/browse/KYLIN-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17089386#comment-17089386
 ]


ASF GitHub Bot commented on KYLIN-4185:
---------------------------------------

nichunen commented on a change in pull request #1071:
URL: https://github.com/apache/kylin/pull/1071#discussion_r368454036



##########
File path: website/_docs/install/configuration.md
##########
@@ -301,6 +301,7 @@ Both Kylin and HBase use compression when writing to disk, 
so Kylin will multipl
 - `kylin.cube.size-estimate-memhungry-ratio`: Deprecated, default is 0.05
 - `kylin.cube.size-estimate-countdistinct-ratio`: Cube Size Estimation with 
count distinct h= metric, default value is 0.5
 - `kylin.cube.size-estimate-topn-ratio`: Cube Size Estimation with TopN 
metric, default value is 0.5
+- `kylin.cube.size-estimate-enable-optimize`: Use historical estimation result 
to optimize the new one, default value is false 

Review comment:
       I think the description is a little puzzling. In my option, just say 
"enable optimization of cube size esimation" is enough.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


> CubeStatsReader estimate wrong cube size
> ----------------------------------------
>
>                 Key: KYLIN-4185
>                 URL: https://issues.apache.org/jira/browse/KYLIN-4185
>             Project: Kylin
>          Issue Type: Improvement
>            Reporter: ZhouKang
>            Assignee: ZhouKang
>            Priority: Major
>             Fix For: v3.1.0
>
>
> CubeStatsReader estimate wrong cube size, which cause a lot of problems.
> when the estimated size is much larger than the real size, the spark 
> application's executor number is small, and cube build step will take a long 
> time. sometime the step will failed due to the large dataset.
> When the estimated size is much smaller than the real size. the cuboid file 
> in HDFS is small, and there are much of cuboid file.
>  
> In our production environment, both the two situation happened.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (KYLIN-4185) CubeStatsReader estimate wrong cube size

Reply via email to