jainankitk commented on issue #12317:
URL: https://github.com/apache/lucene/issues/12317#issuecomment-1579572674

   > In general, we don't like adding options to file formats and prefer to 
have full control to keep file formats easy to reason about and to test.
   > I'm just wondering if disabling this compression is something that users 
would actually be interested in, as I question how it might impact the query 
performance.
   
   Since I don't have concrete evidence of performance degradation, it looks 
reasonable to not add option for keeping testing overhead limited
   
   > So I wouldn't generally expect it to be a big contributor to a heap 
profile unless there are many small segments getting written, which could 
happen if you do frequent refreshes, have many fields, or many indices. In that 
case it's possible that LZ4 compression never gets used on some fields/segments 
because of the checks on prefix length and average suffix length, so your idea 
to lazily allocate this compression hash table might help?
   
   Per field per segment looks reasonably high to me, given each of these are 
allocating 256k (128k for short[] and 128k for int[]). I have seen index 
mappings upto 1500 fields, although not all of them are text fields. But for 
these very large documents, we are talking couple hundred mbs. And due to 
tiered merge policy every segment might be getting merged a few times. Hence, 
it does make sense to lazily allocate this compression hash table
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to