[jira] [Updated] (KYLIN-5561) Optimize the build performance for models containing semi-additive measure

Guangyuan Feng (Jira) Tue, 13 Jun 2023 00:35:10 -0700


     [ 
https://issues.apache.org/jira/browse/KYLIN-5561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Guangyuan Feng updated KYLIN-5561:
----------------------------------
    Fix Version/s: 5.0-beta
                       (was: 5.0-alpha)

> Optimize the build performance for models containing semi-additive measure
> --------------------------------------------------------------------------
>
>                 Key: KYLIN-5561
>                 URL: https://issues.apache.org/jira/browse/KYLIN-5561
>             Project: Kylin
>          Issue Type: Bug
>          Components: Modeling
>    Affects Versions: 5.0-alpha
>            Reporter: Guangyuan Feng
>            Assignee: Yaguang Jia
>            Priority: Major
>             Fix For: 5.0-beta
>
>
> When building a model with aggregate function `{*}sum_lc{*}`, it takes too 
> much time to complete the calculation even on a small dataset. After dug into 
> it's implementation, we found the root cause is that the `{*}serialize{*}` 
> will always allocate a new array with `1024 * 1024` bytes as the temporary 
> place to store the serialized value of `{*}SumLCCounter{*}`.
> Actually, only a decimal and a long value of a `{*}SumLCCounter{*}` object 
> should be serialized, generally the serialized data size is about *`8 + 8` 
> bytes* in 64-bit platform, so obviously the temporary array is too big to 
> store the result.
> After deduce the init size of the temporary array, for an instance 
> {*}32-Bytes{*}, the total time to complete the calculation of `{*}sum_lc{*}` 
> on 10GB datasets, have been reduced from 16min => 4min.
>  
> Here is the benchmark tests:
> {code:java}
> // After optimized
> # Warmup: 1 iterations, 10 s each
> # Measurement: 5 iterations, 10 s each
> # Timeout: 10 min per iteration
> # Threads: 1 thread, will synchronize iterations
> # Benchmark mode: Throughput, ops/time
> # Benchmark: io.kyligence.pe.JmhSumLCApplication.dynamicLength
> # Run progress: 0.00% complete, ETA 00:04:00
> # Fork: 1 of 2
> # Warmup Iteration   1: 39082.864 ops/ms
> Iteration   1: 41760.550 ops/ms
> Iteration   2: 47911.634 ops/ms
> Iteration   3: 47353.936 ops/ms
> Iteration   4: 46888.688 ops/ms
> Iteration   5: 48378.075 ops/ms
> # Run progress: 25.00% complete, ETA 00:03:02
> # Fork: 2 of 2
> # Warmup Iteration   1: 39479.279 ops/ms
> Iteration   1: 42066.415 ops/ms
> Iteration   2: 48499.974 ops/ms
> Iteration   3: 48524.844 ops/ms
> Iteration   4: 48431.830 ops/ms
> Iteration   5: 48451.256 ops/ms
> Result "io.kyligence.pe.JmhSumLCApplication.dynamicLength":
>   46826.720 ±(99.9%) 4002.887 ops/ms [Average]
>   (min, avg, max) = (41760.550, 46826.720, 48524.844), stdev = 2647.662
>   CI (99.9%): [42823.833, 50829.607] (assumes normal distribution)
> // Before optimized
> # Warmup: 1 iterations, 10 s each
> # Measurement: 5 iterations, 10 s each
> # Timeout: 10 min per iteration
> # Threads: 1 thread, will synchronize iterations
> # Benchmark mode: Throughput, ops/time
> # Benchmark: io.kyligence.pe.JmhSumLCApplication.fixLength
> # Run progress: 50.00% complete, ETA 00:02:01
> # Fork: 1 of 2
> # Warmup Iteration   1: 22.364 ops/ms
> Iteration   1: 25.354 ops/ms
> Iteration   2: 25.252 ops/ms
> Iteration   3: 20.566 ops/ms
> Iteration   4: 20.668 ops/ms
> Iteration   5: 21.585 ops/ms
> # Run progress: 75.00% complete, ETA 00:01:00
> # Fork: 2 of 2
> # Warmup Iteration   1: 22.953 ops/ms
> Iteration   1: 25.362 ops/ms
> Iteration   2: 24.041 ops/ms
> Iteration   3: 21.774 ops/ms
> Iteration   4: 25.131 ops/ms
> Iteration   5: 25.594 ops/ms
> Result "io.kyligence.pe.JmhSumLCApplication.fixLength":
>   23.533 ±(99.9%) 3.210 ops/ms [Average]
>   (min, avg, max) = (20.566, 23.533, 25.594), stdev = 2.123
>   CI (99.9%): [20.323, 26.743] (assumes normal distribution)
> # Run complete. Total time: 00:04:03
> REMEMBER: The numbers below are just data. To gain reusable insights, you 
> need to follow up on
> why the numbers are the way they are. Use profilers (see -prof, -lprof), 
> design factorial
> experiments, perform baseline and negative tests that provide experimental 
> control, make sure
> the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from 
> the domain experts.
> Do not assume the numbers tell you what you want them to tell.
> Benchmark                           Mode  Cnt      Score      Error   Units
> JmhSumLCApplication.dynamicLength  thrpt   10  46826.720 ± 4002.887  ops/ms
> JmhSumLCApplication.fixLength      thrpt   10     23.533 ±    3.210  ops/ms 
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (KYLIN-5561) Optimize the build performance for models containing semi-additive measure

Reply via email to