kangkaisen created KYLIN-1694:
---------------------------------
Summary: make multiply coefficient configurable when estimating
cuboid size
Key: KYLIN-1694
URL: https://issues.apache.org/jira/browse/KYLIN-1694
Project: Kylin
Issue Type: Bug
Components: Job Engine
Affects Versions: v1.5.1, v1.5.0
Reporter: kangkaisen
Assignee: Dong Li
In the current version of MRv2 build engine, in CubeStatsReader when estimating
cuboid size , the curent method is "cube is memory hungry, storage size
estimation multiply 0.05" and "cube is not memory hungry, storage size
estimation multiply 0.25".
This has one major problems:the default multiply coefficient is smaller, this
will make the estimated cuboid size much less than the actual
cuboid size,which will lead to the region numbers of HBase and the reducer
numbers of CubeHFileJob are both smaller. obviously, the current method
makes the job of CubeHFileJob much slower.
After we remove the the default multiply coefficient, the job of CubeHFileJob
becomes much faster.
we'd better make multiply coefficient configurable and this could be more
friendly for user.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)