[ 
https://issues.apache.org/jira/browse/LUCENE-9447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17177806#comment-17177806
 ] 

Robert Muir commented on LUCENE-9447:
-------------------------------------

{quote}
but I still worry that preset dictionaries would be challenging to integrate 
while still copying compressed data when merging as we can only do it if blocks 
use the same dictionary?
{quote}

Yes, it probably needs total redesign/thought. The copying of compressed data 
is really important so that merge isn't horrible...

> Make BEST_COMPRESSION compress more aggressively?
> -------------------------------------------------
>
>                 Key: LUCENE-9447
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9447
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The Lucene86 codec supports setting a "Mode" for stored fields compression, 
> that is either "BEST_SPEED", which translates to blocks of 16kB or 128 
> documents (whichever is hit first) compressed with LZ4, or 
> "BEST_COMPRESSION", which translates to blocks of 60kB or 512 documents 
> compressed with DEFLATE with default compression level (6).
> After looking at indices that spent most disk space on stored fields 
> recently, I noticed that there was quite some room for improvement by 
> increasing the block size even further:
> ||Block size||Stored fields size||
> |60kB|168412338|
> |128kB|130813639|
> |256kB|113587009|
> |512kB|104776378|
> |1MB|100367095|
> |2MB|98152464|
> |4MB|97034425|
> |8MB|96478746|
> For this specific dataset, I had 1M documents that each had about 2kB of 
> stored fields each and quite some redundancy.
> This makes me want to look into bumping this block size to maybe 256kB. It 
> would be interesting to re-do the experiments we did on LUCENE-6100 to see 
> how this affects the merging speed. That said I don't think it would be 
> terrible if the merging time increased a bit given that we already offer the 
> BEST_SPEED option for CPU-savvy users.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to