CaoManhDat commented on issue #1395: SOLR-14365: CollapsingQParser - Avoiding 
always allocate int[] and float[] with size equals to number of unique values 
(WIP)
URL: https://github.com/apache/lucene-solr/pull/1395#issuecomment-610426625
 
 
   Thanks @bruno-roustant.
   One clarification here, let's says that we pay `n` bytes (let's assume is 1 
byte) to store n elements. If we specify `n` as expected elements for hashMap, 
the total memory we pay is 
   ```
   arraySize = nextPowerOf2(n / loadFactor)
   memory = arraySize * 2
   ```
   But that is not enough to store `n` elements, since the hashMap will resize 
when number of elements are greater than `arraySize * loadFactor`.
   
   For example if n = 2^20 (1 million), loadFactor = 0.75
   ```
   arraySize = 2^21
   memory = 2^22 bytes 
   ```
   Resizing will happen at 2^21 * 0.75 = 1.5 mil
   ```
   arraySize = 2^22
   memory = 2^23 bytes
   ```
   so in total we pay 2^22 + 2^23 bytes = 12.5 mil bytes.
   
   Therefore the amount of memory we pay for storing n elements with hashMap is 
12.5 times comparing to array. So we have a simple table here
   | Load  | Array / HashMap mem used |
   | ------------- | ------------- | 
   | 1.5%  | 5.3  |
   | 3%  | 2.67 |
   | 5% | 1.6 |
   | 8% | 1.0 |
   | 15% | 0.53 |
   
   Do not forget that rehashing can be slow depending on hashing and keys. So I 
kinda feel the current setup is good and it will help to covers most of cases. 
We of course can introduce a parameter that user can tune to trade-off some 
memory vs performance (only matter when they do collapsing on 1.5% -> 5% of 
their total unique values). But we can do that in another issue.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to