benwtrent commented on PR #15549: URL: https://github.com/apache/lucene/pull/15549#issuecomment-3711735284
@Pulkitg64 the latency is the main concern IMO. We must copy the vectors onto heap (we know this is expensive), transform the bytes to `float32` (which is an additional cost), then do the float32 panama vector actions (which are super fast). I would expect this to also impact quantization query time for anything that must rescore (though, likely less of an impact as that would be fewer vectors to decode). I wonder if all the cost is spent just decoding the vector? What does a flame graph tell you? Also, could you indicate your JVM, etc.? See this interesting jep update on the ever incubating vector API: https://openjdk.org/jeps/508 > Addition, subtraction, division, multiplication, square root, and fused multiply/add operations on Float16 values are now auto-vectorized on supporting x64 CPUs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
