mccullocht commented on PR #15736:
URL: https://github.com/apache/lucene/pull/15736#issuecomment-3936637232

   Good to see that this improves throughput on graviton3!
   
   `convert()` does produce a vector of the same bit width, I'm noting that if 
I have a uint16x8 vector and I try to widen to uint32 lanes I have to do it 
twice to see all the data in the input vector (two uint32x4 registers). It can 
sometimes be tricker to extract values at the top of the register (the last 4 
16 bit entries in this case) and maybe the API is choosing a poor plan for this.
   
   I'll try to run luceneutil this afternoon to see if this changes things in 
the macro benchmark. It's probably also worth running the microbenchmark in 
branch_10x with jdk21 to make sure this should target 10.5 instead of 11.0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to