[
https://issues.apache.org/jira/browse/KYLIN-2518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Billy Liu resolved KYLIN-2518.
------------------------------
Resolution: Fixed
Fix Version/s: v2.0.0
https://github.com/apache/kylin/commit/4c21821471cb261cfecdf8289c5f8284af817b3e
> Improve the sampling performance of FactDistinctColumns step
> ------------------------------------------------------------
>
> Key: KYLIN-2518
> URL: https://issues.apache.org/jira/browse/KYLIN-2518
> Project: Kylin
> Issue Type: Improvement
> Reporter: XIE FAN
> Assignee: XIE FAN
> Fix For: v2.0.0
>
>
> The method putRowKeyToHLL() in FactDistinctColumnsMapper can be very slow
> when sampling rate is high. After carefully profiling, we believe that it's
> performance can be improved by modifying it's hash method. At the same time,
> we also found an algorithm that can estimate the row nums of each cuboid
> accurately with a lower sampling rate. I will share more test results and
> details of the algorithm once after this issue is done.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)