Pulkitg64 opened a new pull request, #15549:
URL: https://github.com/apache/lucene/pull/15549

   ### Description
   
   This draft PR explores storing float vectors using 2 bytes (half-float/FP16) 
instead of 4 bytes (FP32), reducing vector disk usage by approximately 50%. The 
approach involves storing vectors on disk in half-float format while converting 
them back to full-float precision for dot-product computations during search 
and index merge operations. However, this conversion step introduces additional 
overhead during vector reads, resulting in slower indexing and search 
performance. 
   
   This is an early draft to gather community feedback on the viability and 
direction of this implementation..
   
   ToDo: Support for MemorySegmentVectorScorer with half-float vectors is yet 
to be implemented.
   
   
   * Benchmark Results:
   
   For no quantization, we are seeing around 100% increase in latency.  For 
8bit and 4 bit quantization, we are not seeing latency regression but for 4 bit 
we are seeing about 18% latency regression. We are seeing 20-25% drop in 
indexing rate across all quantization. 
   
   
   | Encoding | recall  | latency(ms)  | quantized  | index(s)  | index_docs/s  
| index_size(MB)  | vec_disk(MB)  | vec_RAM(MB) |
   
|----------|---------|--------------|------------|-----------|---------------|-----------------|---------------|-------------|
   | float16  |  0.991  | 11.392       | no         | 34.8      | 2873.81       
| 206.22          | 390.625       | 390.625     |
   | float16  |  0.981  | 4.337        | 8 bits     | 41.55     | 2406.97       
| 305.4           | 294.495       | 99.182      |
   | float16  |  0.926  | 6.069        | 4 bits     | 42.07     | 2376.93       
| 256.58          | 245.667       | 50.354      |
   | float32  |  0.991  | 4.942        | no         | 28.93     | 3456.38       
| 401.53          | 390.625       | 390.625     |
   | float32  |  0.981  | 4.367        | 8 bits     | 32.04     | 3121.49       
| 500.71          | 489.807       | 99.182      |
   | float32  |  0.926  | 5.343        | 4 bits     | 32.12     | 3113.33       
| 451.91          | 440.979       | 50.354      |
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to