[ https://issues.apache.org/jira/browse/LUCENE-9486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17185904#comment-17185904 ]
Adrien Grand commented on LUCENE-9486: -------------------------------------- I played with various configurations and ended up with a preset dictionary of 4kB combined with 10 sub blocks of 60kB, which gives interesting results. Here are some benchmarks on the same datasets as LUCENE-9447: On highly compressible JSON logs: ||Method||Index size(MB)||Index time(s)||Avg fetch time (us)|| |LZ4(16kB) (current BEST_SPEED)|304,2|9|5| |LZ4(60kB)|141,7|7,5|10| |LZ4(256kB)|105,1|7,5|33| |LZ4(1MB)|96,5|7,5|115| |LZ4 with preset dict (new BEST_SPEED)|91,9|7,5|16| |Deflate with preset dict (new BEST_SPEED)|64.9|14|41| On enwiki documents: ||Method||Index size(MB)||Index time(s)||Avg fetch time (us)|| |LZ4(16kB) (current BEST_SPEED)|558,8|14,5|83| |LZ4(60kB)|526,2|15|120| |LZ4(256kB)|523,1|15|323| |LZ4(1MB)|521,3|15,5|1151| |LZ4 with preset dict (new BEST_SPEED)|515,2|15|135| |Deflate with preset dict (new BEST_SPEED)|338.0|35|250| It makes fetch times a bit slower, which is fair I think given that these fetch times are still way under the cost of a page fault. Indexing remains as fast as today and compression gets respectively 3.3x and 8% better on these datasets. I also included the results with BEST_COMPRESSION in the above benchmarks to show the trade-off that users are making when going with one versus the other. > Explore using preset dictionaries with LZ4 for stored fields > ------------------------------------------------------------ > > Key: LUCENE-9486 > URL: https://issues.apache.org/jira/browse/LUCENE-9486 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Adrien Grand > Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > Follow-up of LUCENE-9447: using preset dictionaries with DEFLATE provided > very significant gains. Adding support for preset dictionaries with LZ4 would > be easy so let's give it a try? -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org