florian-jobs commented on PR #2398:
URL: https://github.com/apache/systemds/pull/2398#issuecomment-3846326231

   We changed the `ColGroupDDCLZWBenchmark` class to use 
`estimateInMemorySize()` instead of `getExactSizeOnDisk()` for memory 
estimation. While `getExactSizeOnDisk()` returns the exact serialized size 
produced by `write()`, `estimateInMemorySize()` is the intended method in 
SystemDS for estimating the in-memory footprint of column groups. We also 
updated `estimateInMemorySize()` to account for the LZW metadata and the LZW 
mapping.
   
   Observation from the “distributed” benchmark:
   - The absolute byte numbers differ between the two modes (expected: 
in-memory estimate includes JVM overhead, whereas on-disk size is a compact 
serialization format).
   - However, the qualitative behavior and relative trends are very similar 
between `estimateInMemorySize()` and `getExactSizeOnDisk()` across the tested 
(size, nUnique) points (i.e., where DDCLZW is beneficial/harmful stays 
consistent).
   - As expected, DDCLZW tends to be unfavorable for very small inputs (fixed 
overhead dominates), while for larger sizes and low-to-moderate nUnique it 
achieves strong reductions. Around typical DDC representation boundaries (e.g., 
256→257, 65536→65537) the baseline DDC memory changes noticeably, which is 
reflected in the reported reductions as well.
   
   Below are the results from `benchmarkDistributed` using both types modes for 
comparison.
   
   ```java
   
================================================================================
   Benchmark: benchmarkDistributed using estimateInMemorySize
   
================================================================================
   
   ................................... Size: 100 
...................................
   Size:     100 | nUnique:     2 | Entropy: 100,00% | DDC:     172 bytes | 
DDCLZW:     248 bytes | Memory reduction:  -44,19% | De-/Compression speedup: 
0,01/0,00 times
   Size:     100 | nUnique:     3 | Entropy:  99,99% | DDC:     280 bytes | 
DDCLZW:     272 bytes | Memory reduction:    2,86% | De-/Compression speedup: 
0,01/0,00 times
   Size:     100 | nUnique:     5 | Entropy: 100,00% | DDC:     296 bytes | 
DDCLZW:     312 bytes | Memory reduction:   -5,41% | De-/Compression speedup: 
0,02/0,00 times
   Size:     100 | nUnique:    10 | Entropy: 100,00% | DDC:     336 bytes | 
DDCLZW:     392 bytes | Memory reduction:  -16,67% | De-/Compression speedup: 
0,00/0,00 times
   Size:     100 | nUnique:    20 | Entropy: 100,00% | DDC:     416 bytes | 
DDCLZW:     552 bytes | Memory reduction:  -32,69% | De-/Compression speedup: 
0,00/0,00 times
   Size:     100 | nUnique:    50 | Entropy: 100,00% | DDC:     656 bytes | 
DDCLZW:     952 bytes | Memory reduction:  -45,12% | De-/Compression speedup: 
0,01/0,00 times
   Size:     100 | nUnique:   100 | Entropy: 100,00% | DDC:    1056 bytes | 
DDCLZW:    1352 bytes | Memory reduction:  -28,03% | De-/Compression speedup: 
0,00/0,00 times
   ................................... Size: 100000 
...................................
   Size:  100000 | nUnique:     2 | Entropy: 100,00% | DDC:    6420 bytes | 
DDCLZW:    2696 bytes | Memory reduction:   58,01% | De-/Compression speedup: 
0,00/0,00 times
   Size:  100000 | nUnique:     3 | Entropy: 100,00% | DDC:  100184 bytes | 
DDCLZW:    3272 bytes | Memory reduction:   96,73% | De-/Compression speedup: 
0,00/0,00 times
   Size:  100000 | nUnique:     5 | Entropy: 100,00% | DDC:  100200 bytes | 
DDCLZW:    4192 bytes | Memory reduction:   95,82% | De-/Compression speedup: 
0,00/0,00 times
   Size:  100000 | nUnique:    10 | Entropy: 100,00% | DDC:  100240 bytes | 
DDCLZW:    5872 bytes | Memory reduction:   94,14% | De-/Compression speedup: 
0,00/0,00 times
   Size:  100000 | nUnique:    20 | Entropy: 100,00% | DDC:  100320 bytes | 
DDCLZW:    8312 bytes | Memory reduction:   91,71% | De-/Compression speedup: 
0,00/0,00 times
   Size:  100000 | nUnique:    50 | Entropy: 100,00% | DDC:  100560 bytes | 
DDCLZW:   13152 bytes | Memory reduction:   86,92% | De-/Compression speedup: 
0,00/0,00 times
   Size:  100000 | nUnique:   100 | Entropy: 100,00% | DDC:  100960 bytes | 
DDCLZW:   18952 bytes | Memory reduction:   81,23% | De-/Compression speedup: 
0,00/0,00 times
   Size:  100000 | nUnique:   200 | Entropy: 100,00% | DDC:  101760 bytes | 
DDCLZW:   27352 bytes | Memory reduction:   73,12% | De-/Compression speedup: 
0,00/0,00 times
   Size:  100000 | nUnique:   256 | Entropy:  99,99% | DDC:  102208 bytes | 
DDCLZW:   30896 bytes | Memory reduction:   69,77% | De-/Compression speedup: 
0,00/0,00 times
   Size:  100000 | nUnique:   257 | Entropy: 100,00% | DDC:  202216 bytes | 
DDCLZW:   30992 bytes | Memory reduction:   84,67% | De-/Compression speedup: 
0,00/0,00 times
   Size:  100000 | nUnique:   500 | Entropy: 100,00% | DDC:  204160 bytes | 
DDCLZW:   44152 bytes | Memory reduction:   78,37% | De-/Compression speedup: 
0,00/0,00 times
   Size:  100000 | nUnique:  1000 | Entropy: 100,00% | DDC:  208160 bytes | 
DDCLZW:   64152 bytes | Memory reduction:   69,18% | De-/Compression speedup: 
0,00/0,00 times
   Size:  100000 | nUnique: 10000 | Entropy: 100,00% | DDC:  280160 bytes | 
DDCLZW:  240152 bytes | Memory reduction:   14,28% | De-/Compression speedup: 
0,00/0,00 times
   Size:  100000 | nUnique: 65536 | Entropy:  71,34% | DDC:  724448 bytes | 
DDCLZW:  787632 bytes | Memory reduction:   -8,72% | De-/Compression speedup: 
0,00/0,00 times
   Size:  100000 | nUnique: 65537 | Entropy:  71,34% | DDC:  824496 bytes | 
DDCLZW:  787648 bytes | Memory reduction:    4,47% | De-/Compression speedup: 
0,00/0,00 times
   Size:  100000 | nUnique: 80000 | Entropy:  84,43% | DDC:  940200 bytes | 
DDCLZW:  960952 bytes | Memory reduction:   -2,21% | De-/Compression speedup: 
0,00/0,00 times
   
   
   
================================================================================
   Benchmark: benchmarkDistributed using getExactSizeOnDisk
   
================================================================================
   
   ................................... Size: 100 
...................................
   Size:     100 | nUnique:     2 | Entropy: 100,00% | DDC:      52 bytes | 
DDCLZW:     119 bytes | Memory reduction: -128,85% | De-/Compression speedup: 
0,01/0,00 times
   Size:     100 | nUnique:     3 | Entropy:  99,99% | DDC:     144 bytes | 
DDCLZW:     147 bytes | Memory reduction:   -2,08% | De-/Compression speedup: 
0,01/0,00 times
   Size:     100 | nUnique:     5 | Entropy: 100,00% | DDC:     160 bytes | 
DDCLZW:     183 bytes | Memory reduction:  -14,38% | De-/Compression speedup: 
0,01/0,00 times
   Size:     100 | nUnique:    10 | Entropy: 100,00% | DDC:     200 bytes | 
DDCLZW:     263 bytes | Memory reduction:  -31,50% | De-/Compression speedup: 
0,02/0,00 times
   Size:     100 | nUnique:    20 | Entropy: 100,00% | DDC:     280 bytes | 
DDCLZW:     423 bytes | Memory reduction:  -51,07% | De-/Compression speedup: 
0,00/0,00 times
   Size:     100 | nUnique:    50 | Entropy: 100,00% | DDC:     520 bytes | 
DDCLZW:     823 bytes | Memory reduction:  -58,27% | De-/Compression speedup: 
0,01/0,00 times
   Size:     100 | nUnique:   100 | Entropy: 100,00% | DDC:     920 bytes | 
DDCLZW:    1223 bytes | Memory reduction:  -32,93% | De-/Compression speedup: 
0,00/0,00 times
   ................................... Size: 100000 
...................................
   Size:  100000 | nUnique:     2 | Entropy: 100,00% | DDC:   12540 bytes | 
DDCLZW:    2567 bytes | Memory reduction:   79,53% | De-/Compression speedup: 
0,00/0,00 times
   Size:  100000 | nUnique:     3 | Entropy: 100,00% | DDC:  100044 bytes | 
DDCLZW:    3147 bytes | Memory reduction:   96,85% | De-/Compression speedup: 
0,00/0,00 times
   Size:  100000 | nUnique:     5 | Entropy: 100,00% | DDC:  100060 bytes | 
DDCLZW:    4063 bytes | Memory reduction:   95,94% | De-/Compression speedup: 
0,00/0,00 times
   Size:  100000 | nUnique:    10 | Entropy: 100,00% | DDC:  100100 bytes | 
DDCLZW:    5743 bytes | Memory reduction:   94,26% | De-/Compression speedup: 
0,00/0,00 times
   Size:  100000 | nUnique:    20 | Entropy: 100,00% | DDC:  100180 bytes | 
DDCLZW:    8183 bytes | Memory reduction:   91,83% | De-/Compression speedup: 
0,00/0,00 times
   Size:  100000 | nUnique:    50 | Entropy: 100,00% | DDC:  100420 bytes | 
DDCLZW:   13023 bytes | Memory reduction:   87,03% | De-/Compression speedup: 
0,00/0,00 times
   Size:  100000 | nUnique:   100 | Entropy: 100,00% | DDC:  100820 bytes | 
DDCLZW:   18823 bytes | Memory reduction:   81,33% | De-/Compression speedup: 
0,00/0,00 times
   Size:  100000 | nUnique:   200 | Entropy: 100,00% | DDC:  101620 bytes | 
DDCLZW:   27223 bytes | Memory reduction:   73,21% | De-/Compression speedup: 
0,00/0,00 times
   Size:  100000 | nUnique:   256 | Entropy:  99,99% | DDC:  102068 bytes | 
DDCLZW:   30767 bytes | Memory reduction:   69,86% | De-/Compression speedup: 
0,00/0,00 times
   Size:  100000 | nUnique:   257 | Entropy: 100,00% | DDC:  202076 bytes | 
DDCLZW:   30867 bytes | Memory reduction:   84,73% | De-/Compression speedup: 
0,00/0,00 times
   Size:  100000 | nUnique:   500 | Entropy: 100,00% | DDC:  204020 bytes | 
DDCLZW:   44023 bytes | Memory reduction:   78,42% | De-/Compression speedup: 
0,00/0,00 times
   Size:  100000 | nUnique:  1000 | Entropy: 100,00% | DDC:  208020 bytes | 
DDCLZW:   64023 bytes | Memory reduction:   69,22% | De-/Compression speedup: 
0,00/0,00 times
   Size:  100000 | nUnique: 10000 | Entropy: 100,00% | DDC:  280020 bytes | 
DDCLZW:  240023 bytes | Memory reduction:   14,28% | De-/Compression speedup: 
0,00/0,00 times
   Size:  100000 | nUnique: 65536 | Entropy:  71,34% | DDC:  724308 bytes | 
DDCLZW:  787507 bytes | Memory reduction:   -8,73% | De-/Compression speedup: 
0,00/0,00 times
   Size:  100000 | nUnique: 65537 | Entropy:  71,34% | DDC:  824316 bytes | 
DDCLZW:  787519 bytes | Memory reduction:    4,46% | De-/Compression speedup: 
0,00/0,00 times
   Size:  100000 | nUnique: 80000 | Entropy:  84,43% | DDC:  940020 bytes | 
DDCLZW:  960823 bytes | Memory reduction:   -2,21% | De-/Compression speedup: 
0,00/0,00 times
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to