LukaDeka commented on PR #2398: URL: https://github.com/apache/systemds/pull/2398#issuecomment-3781072027
Added a few benchmarks that mostly compare memory as well as operation times for methods (so far, only for `getIdx`). Right now, the comparison is only done for `DDCLZW` with `DDC`. There are sizable memory savings for datasets with repeating patterns or large datasets: ```r ================================================================================ Benchmark: benchmarkRandomData ================================================================================ Size: 1 | DDC: 61 bytes | DDCLZW: 67 bytes | Memory reduction: -9.84% | De-/Compression speedup: 0.09/0.00 times Size: 10 | DDC: 70 bytes | DDCLZW: 95 bytes | Memory reduction: -35.71% | De-/Compression speedup: 0.04/0.00 times Size: 100 | DDC: 160 bytes | DDCLZW: 299 bytes | Memory reduction: -86.87% | De-/Compression speedup: 0.01/0.00 times Size: 1000 | DDC: 1060 bytes | DDCLZW: 1551 bytes | Memory reduction: -46.32% | De-/Compression speedup: 0.00/0.00 times Size: 10000 | DDC: 10060 bytes | DDCLZW: 10487 bytes | Memory reduction: -4.24% | De-/Compression speedup: 0.00/0.00 times Size: 100000 | DDC: 100060 bytes | DDCLZW: 78783 bytes | Memory reduction: 21.26% | De-/Compression speedup: 0.00/0.00 times ``` I also added the `De-/Compression speedup` field to compare other compression types with each other as well. I also added a benchmark for the slides, but it doesn't look too useful at the moment: ```r ================================================================================ Benchmark: benchmarkSlice ================================================================================ Size: 1 | Slice[ 0: 0] | DDC: 0 ms | DDCLZW: 1 ms | Slowdown: 37.09 times Size: 10 | Slice[ 2: 7] | DDC: 0 ms | DDCLZW: 20 ms | Slowdown: 1141.72 times Size: 100 | Slice[ 25: 75] | DDC: 0 ms | DDCLZW: 3 ms | Slowdown: 169.34 times Size: 1000 | Slice[ 250: 750] | DDC: 0 ms | DDCLZW: 3 ms | Slowdown: 348.98 times Size: 10000 | Slice[ 2500: 7500] | DDC: 0 ms | DDCLZW: 6 ms | Slowdown: 483.40 times Size: 100000 | Slice[25000:75000] | DDC: 0 ms | DDCLZW: 24 ms | Slowdown: 325.22 times ``` The file might be in a wrong directory as well and wrongly labeled as a "test". We wouldn't want benchmarks running on every GitHub Actions trigger etc. Would it make more sense to refactor it into a `main` function? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
