rmuir commented on code in PR #13572: URL: https://github.com/apache/lucene/pull/13572#discussion_r1685214866
########## lucene/core/build.gradle: ########## @@ -14,12 +14,59 @@ * See the License for the specific language governing permissions and * limitations under the License. */ +plugins { + id "c" +} apply plugin: 'java-library' +apply plugin: 'c' description = 'Lucene core library' +model { + toolChains { + gcc(Gcc) { + target("linux_aarch64"){ + path '/usr/bin/' + cCompiler.executable 'gcc10-cc' + cCompiler.withArguments { args -> + args << "--shared" + << "-O3" + << "-march=armv8.2-a+dotprod" Review Comment: oh, the other likely explanation on the performance is that the integer dot product in java is not AS HORRIBLE on the 256-bit SVE as it is on the 128-bit neon. it more closely resembles the logic of how it behaves on AVX-256: two 8x8 bit integers ("64-bit vectors") are multiplied into intermediate 8x16-bit result (128-bit vector) and added to 8x32-bit (256-bit vector). Of course, it does not use SDOT instruction which is sad as it is CPU instruction intended precisely for this purpose. On the 128-bit neon there is not a possibility with java's vector api to process 4x8 bit integers ("32-bit vectors") like the SDOT instruction does: https://developer.arm.com/documentation/102651/a/What-are-dot-product-intructions- Nor is it even performant to take 64-bit vector and process "part 0" then "part 1". The situation is really sad, and the performance reflects that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org