[
https://issues.apache.org/jira/browse/KYLIN-5561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Guangyuan Feng updated KYLIN-5561:
----------------------------------
Fix Version/s: 5.0-beta
(was: 5.0-alpha)
> Optimize the build performance for models containing semi-additive measure
> --------------------------------------------------------------------------
>
> Key: KYLIN-5561
> URL: https://issues.apache.org/jira/browse/KYLIN-5561
> Project: Kylin
> Issue Type: Bug
> Components: Modeling
> Affects Versions: 5.0-alpha
> Reporter: Guangyuan Feng
> Assignee: Yaguang Jia
> Priority: Major
> Fix For: 5.0-beta
>
>
> When building a model with aggregate function `{*}sum_lc{*}`, it takes too
> much time to complete the calculation even on a small dataset. After dug into
> it's implementation, we found the root cause is that the `{*}serialize{*}`
> will always allocate a new array with `1024 * 1024` bytes as the temporary
> place to store the serialized value of `{*}SumLCCounter{*}`.
> Actually, only a decimal and a long value of a `{*}SumLCCounter{*}` object
> should be serialized, generally the serialized data size is about *`8 + 8`
> bytes* in 64-bit platform, so obviously the temporary array is too big to
> store the result.
> After deduce the init size of the temporary array, for an instance
> {*}32-Bytes{*}, the total time to complete the calculation of `{*}sum_lc{*}`
> on 10GB datasets, have been reduced from 16min => 4min.
>
> Here is the benchmark tests:
> {code:java}
> // After optimized
> # Warmup: 1 iterations, 10 s each
> # Measurement: 5 iterations, 10 s each
> # Timeout: 10 min per iteration
> # Threads: 1 thread, will synchronize iterations
> # Benchmark mode: Throughput, ops/time
> # Benchmark: io.kyligence.pe.JmhSumLCApplication.dynamicLength
> # Run progress: 0.00% complete, ETA 00:04:00
> # Fork: 1 of 2
> # Warmup Iteration 1: 39082.864 ops/ms
> Iteration 1: 41760.550 ops/ms
> Iteration 2: 47911.634 ops/ms
> Iteration 3: 47353.936 ops/ms
> Iteration 4: 46888.688 ops/ms
> Iteration 5: 48378.075 ops/ms
> # Run progress: 25.00% complete, ETA 00:03:02
> # Fork: 2 of 2
> # Warmup Iteration 1: 39479.279 ops/ms
> Iteration 1: 42066.415 ops/ms
> Iteration 2: 48499.974 ops/ms
> Iteration 3: 48524.844 ops/ms
> Iteration 4: 48431.830 ops/ms
> Iteration 5: 48451.256 ops/ms
> Result "io.kyligence.pe.JmhSumLCApplication.dynamicLength":
> 46826.720 ±(99.9%) 4002.887 ops/ms [Average]
> (min, avg, max) = (41760.550, 46826.720, 48524.844), stdev = 2647.662
> CI (99.9%): [42823.833, 50829.607] (assumes normal distribution)
> // Before optimized
> # Warmup: 1 iterations, 10 s each
> # Measurement: 5 iterations, 10 s each
> # Timeout: 10 min per iteration
> # Threads: 1 thread, will synchronize iterations
> # Benchmark mode: Throughput, ops/time
> # Benchmark: io.kyligence.pe.JmhSumLCApplication.fixLength
> # Run progress: 50.00% complete, ETA 00:02:01
> # Fork: 1 of 2
> # Warmup Iteration 1: 22.364 ops/ms
> Iteration 1: 25.354 ops/ms
> Iteration 2: 25.252 ops/ms
> Iteration 3: 20.566 ops/ms
> Iteration 4: 20.668 ops/ms
> Iteration 5: 21.585 ops/ms
> # Run progress: 75.00% complete, ETA 00:01:00
> # Fork: 2 of 2
> # Warmup Iteration 1: 22.953 ops/ms
> Iteration 1: 25.362 ops/ms
> Iteration 2: 24.041 ops/ms
> Iteration 3: 21.774 ops/ms
> Iteration 4: 25.131 ops/ms
> Iteration 5: 25.594 ops/ms
> Result "io.kyligence.pe.JmhSumLCApplication.fixLength":
> 23.533 ±(99.9%) 3.210 ops/ms [Average]
> (min, avg, max) = (20.566, 23.533, 25.594), stdev = 2.123
> CI (99.9%): [20.323, 26.743] (assumes normal distribution)
> # Run complete. Total time: 00:04:03
> REMEMBER: The numbers below are just data. To gain reusable insights, you
> need to follow up on
> why the numbers are the way they are. Use profilers (see -prof, -lprof),
> design factorial
> experiments, perform baseline and negative tests that provide experimental
> control, make sure
> the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from
> the domain experts.
> Do not assume the numbers tell you what you want them to tell.
> Benchmark Mode Cnt Score Error Units
> JmhSumLCApplication.dynamicLength thrpt 10 46826.720 ± 4002.887 ops/ms
> JmhSumLCApplication.fixLength thrpt 10 23.533 ± 3.210 ops/ms
> {code}
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)