churromorales commented on PR #12408: URL: https://github.com/apache/druid/pull/12408#issuecomment-1099559757
Hi @clintropolis I ran the tests with the parameters, but had to run it with `java -jar` as the parameterization did not work with the command you provided above. What I did find was that zstandard was slower than lz4, I believe all these tests are reading out of memory. To verify I added the `none` compression options to the `ColumnarLongsSelectRowsFromGeneratorBenchmark`. If we are reading out of memory, then any codec which decompresses slower than your bus (of course adjusting for the compression ratio) would perform slower. I did see that while zstd was slower than lz4 for these tests, lz4 was slower than having no compression as well. I think for this patch, it would be interesting to see it from a tiering standpoint in druid. For those segment files that can be memory mapped, having no compression or a ultra fast compression library is best (while sacrificing compression ratio). But for a cold tier maybe it would be best to use a library like zstandard, where you are not always allocating enough memory to guarantee these segment files are memory mapped, perhaps you care more about the space requirement. Anyways I'll post the results for you to take a look at. I turned zstandard in one of our clusters. I had 2 datasources which were both reading from the same kafka topic. One was using zstd, one lz4. After ingesting a few TB of data we did see that zstd had about a footprint of about 8-10% less than lz4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
