LukaDeka commented on PR #2398:
URL: https://github.com/apache/systemds/pull/2398#issuecomment-3781072027

   Added a few benchmarks that mostly compare memory as well as operation times 
for methods (so far, only for `getIdx`).
   
   Right now, the comparison is only done for `DDCLZW` with `DDC`.
   
   There are sizable memory savings for datasets with repeating patterns or 
large datasets:
   ```r
   
================================================================================
   Benchmark: benchmarkRandomData
   
================================================================================
   
   Size:       1 | DDC:       61 bytes | DDCLZW:       67 bytes | Memory 
reduction:  -9.84% | De-/Compression speedup: 0.09/0.00 times
   Size:      10 | DDC:       70 bytes | DDCLZW:       95 bytes | Memory 
reduction: -35.71% | De-/Compression speedup: 0.04/0.00 times
   Size:     100 | DDC:      160 bytes | DDCLZW:      299 bytes | Memory 
reduction: -86.87% | De-/Compression speedup: 0.01/0.00 times
   Size:    1000 | DDC:     1060 bytes | DDCLZW:     1551 bytes | Memory 
reduction: -46.32% | De-/Compression speedup: 0.00/0.00 times
   Size:   10000 | DDC:    10060 bytes | DDCLZW:    10487 bytes | Memory 
reduction:  -4.24% | De-/Compression speedup: 0.00/0.00 times
   Size:  100000 | DDC:   100060 bytes | DDCLZW:    78783 bytes | Memory 
reduction:  21.26% | De-/Compression speedup: 0.00/0.00 times
   ``` 
   
   I also added the `De-/Compression speedup` field to compare other 
compression types with each other as well.
   
   I also added a benchmark for the slides, but it doesn't look too useful at 
the moment:
   ```r
   
================================================================================
   Benchmark: benchmarkSlice
   
================================================================================
   
   Size:       1 | Slice[    0:    0] | DDC:      0 ms | DDCLZW:      1 ms | 
Slowdown: 37.09 times
   Size:      10 | Slice[    2:    7] | DDC:      0 ms | DDCLZW:     20 ms | 
Slowdown: 1141.72 times
   Size:     100 | Slice[   25:   75] | DDC:      0 ms | DDCLZW:      3 ms | 
Slowdown: 169.34 times
   Size:    1000 | Slice[  250:  750] | DDC:      0 ms | DDCLZW:      3 ms | 
Slowdown: 348.98 times
   Size:   10000 | Slice[ 2500: 7500] | DDC:      0 ms | DDCLZW:      6 ms | 
Slowdown: 483.40 times
   Size:  100000 | Slice[25000:75000] | DDC:      0 ms | DDCLZW:     24 ms | 
Slowdown: 325.22 times
   ```
   
   The file might be in a wrong directory as well and wrongly labeled as a 
"test". We wouldn't want benchmarks running on every GitHub Actions trigger etc.
   
   Would it make more sense to refactor it into a `main` function?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to