[ 
https://issues.apache.org/jira/browse/KYLIN-2518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billy Liu resolved KYLIN-2518.
------------------------------
       Resolution: Fixed
    Fix Version/s: v2.0.0

https://github.com/apache/kylin/commit/4c21821471cb261cfecdf8289c5f8284af817b3e

> Improve the sampling performance of FactDistinctColumns step
> ------------------------------------------------------------
>
>                 Key: KYLIN-2518
>                 URL: https://issues.apache.org/jira/browse/KYLIN-2518
>             Project: Kylin
>          Issue Type: Improvement
>            Reporter: XIE FAN
>            Assignee: XIE FAN
>             Fix For: v2.0.0
>
>
> The method putRowKeyToHLL() in FactDistinctColumnsMapper can be very slow 
> when sampling rate is high. After carefully profiling, we believe that it's 
> performance can be improved by modifying it's hash method. At the same time, 
> we also found an algorithm that can estimate the row nums of  each cuboid 
> accurately with a lower sampling rate. I will share more test results and 
> details of the algorithm once after this issue is done.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to