mikemccand commented on PR #15341:
URL: https://github.com/apache/lucene/pull/15341#issuecomment-3422697948
Raptor Lake box is i9-13900K:
```
processor : 31
vendor_id : GenuineIntel
cpu family : 6
model : 183
model name : 13th Gen Intel(R) Core(TM) i9-13900K
stepping : 1
microcode : 0x12f
cpu MHz : 800.000
cache size : 36864 KB
physical id : 0
siblings : 32
core id : 47
cpu cores : 24
apicid : 94
initial apicid : 94
fpu : yes
fpu_exception : yes
cpuid level : 32
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx
pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl
xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64
monitor ds_cpl vmx sm\
x est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe
popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch
cpuid_fault ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid
ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap
clflushopt\
clwb intel_pt sha_ni xsaveopt xsavec xgetbv1 xsaves split_lock_detect
user_shstk avx_vnni dtherm ida arat pln pts hwp hwp_notify hwp_act_window
hwp_epp hwp_pkg_req hfi vnmi umip pku ospke waitpkg gfni vaes vpclmulqdq rdpid
movdiri movdir64b fsrm md_clear serialize pconfig arch_lbr ibt flush_l1d
arch_capabilities
vmx flags : vnmi preemption_timer posted_intr invvpid ept_x_only
ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid
unrestricted_guest vapic_reg vid ple shadow_vmcs ept_violation_ve
ept_mode_based_exec tsc_scaling usr_wait_pause
bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs eibrs_pbrsb
rfds bhi spectre_v2_user
bogomips : 5990.40
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:
```
Results:
```
Benchmark (padBytes) (size) Mode
Cnt Score Error Units
VectorScorerBenchmark.binaryDotProductDefault 0 256 thrpt
15 14.037 ± 0.061 ops/us
VectorScorerBenchmark.binaryDotProductDefault 1 256 thrpt
15 14.046 ± 0.071 ops/us
VectorScorerBenchmark.binaryDotProductDefault 2 256 thrpt
15 14.139 ± 0.089 ops/us
VectorScorerBenchmark.binaryDotProductDefault 4 256 thrpt
15 14.069 ± 0.040 ops/us
VectorScorerBenchmark.binaryDotProductDefault 6 256 thrpt
15 14.038 ± 0.072 ops/us
VectorScorerBenchmark.binaryDotProductDefault 8 256 thrpt
15 14.094 ± 0.070 ops/us
VectorScorerBenchmark.binaryDotProductDefault 16 256 thrpt
15 14.073 ± 0.059 ops/us
VectorScorerBenchmark.binaryDotProductDefault 20 256 thrpt
15 14.134 ± 0.075 ops/us
VectorScorerBenchmark.binaryDotProductDefault 32 256 thrpt
15 14.016 ± 0.044 ops/us
VectorScorerBenchmark.binaryDotProductDefault 50 256 thrpt
15 14.031 ± 0.046 ops/us
VectorScorerBenchmark.binaryDotProductDefault 64 256 thrpt
15 14.082 ± 0.068 ops/us
VectorScorerBenchmark.binaryDotProductDefault 100 256 thrpt
15 14.013 ± 0.059 ops/us
VectorScorerBenchmark.binaryDotProductDefault 128 256 thrpt
15 14.079 ± 0.069 ops/us
VectorScorerBenchmark.binaryDotProductDefault 255 256 thrpt
15 14.143 ± 0.074 ops/us
VectorScorerBenchmark.binaryDotProductDefault 256 256 thrpt
15 14.026 ± 0.028 ops/us
VectorScorerBenchmark.binaryDotProductMemSeg 0 256 thrpt
15 49.305 ± 0.244 ops/us
VectorScorerBenchmark.binaryDotProductMemSeg 1 256 thrpt
15 48.572 ± 0.030 ops/us
VectorScorerBenchmark.binaryDotProductMemSeg 2 256 thrpt
15 48.508 ± 0.198 ops/us
VectorScorerBenchmark.binaryDotProductMemSeg 4 256 thrpt
15 48.636 ± 0.094 ops/us
VectorScorerBenchmark.binaryDotProductMemSeg 6 256 thrpt
15 48.536 ± 0.185 ops/us
VectorScorerBenchmark.binaryDotProductMemSeg 8 256 thrpt
15 49.346 ± 0.166 ops/us
VectorScorerBenchmark.binaryDotProductMemSeg 16 256 thrpt
15 49.419 ± 0.102 ops/us
VectorScorerBenchmark.binaryDotProductMemSeg 20 256 thrpt
15 49.224 ± 0.396 ops/us
VectorScorerBenchmark.binaryDotProductMemSeg 32 256 thrpt
15 49.423 ± 0.134 ops/us
VectorScorerBenchmark.binaryDotProductMemSeg 50 256 thrpt
15 48.676 ± 0.167 ops/us
VectorScorerBenchmark.binaryDotProductMemSeg 64 256 thrpt
15 49.060 ± 0.866 ops/us
VectorScorerBenchmark.binaryDotProductMemSeg 100 256 thrpt
15 49.181 ± 0.210 ops/us
VectorScorerBenchmark.binaryDotProductMemSeg 128 256 thrpt
15 49.444 ± 0.082 ops/us
VectorScorerBenchmark.binaryDotProductMemSeg 255 256 thrpt
15 48.362 ± 0.163 ops/us
VectorScorerBenchmark.binaryDotProductMemSeg 256 256 thrpt
15 48.169 ± 5.291 ops/us
VectorScorerBenchmark.floatDotProductDefault 0 256 thrpt
15 23.215 ± 0.023 ops/us
VectorScorerBenchmark.floatDotProductDefault 1 256 thrpt
15 23.207 ± 0.067 ops/us
VectorScorerBenchmark.floatDotProductDefault 2 256 thrpt
15 23.181 ± 0.086 ops/us
VectorScorerBenchmark.floatDotProductDefault 4 256 thrpt
15 23.156 ± 0.290 ops/us
VectorScorerBenchmark.floatDotProductDefault 6 256 thrpt
15 23.232 ± 0.012 ops/us
VectorScorerBenchmark.floatDotProductDefault 8 256 thrpt
15 23.215 ± 0.091 ops/us
VectorScorerBenchmark.floatDotProductDefault 16 256 thrpt
15 23.194 ± 0.071 ops/us
VectorScorerBenchmark.floatDotProductDefault 20 256 thrpt
15 23.202 ± 0.083 ops/us
VectorScorerBenchmark.floatDotProductDefault 32 256 thrpt
15 23.207 ± 0.048 ops/us
VectorScorerBenchmark.floatDotProductDefault 50 256 thrpt
15 23.227 ± 0.031 ops/us
VectorScorerBenchmark.floatDotProductDefault 64 256 thrpt
15 23.187 ± 0.095 ops/us
VectorScorerBenchmark.floatDotProductDefault 100 256 thrpt
15 23.246 ± 0.114 ops/us
VectorScorerBenchmark.floatDotProductDefault 128 256 thrpt
15 23.214 ± 0.077 ops/us
VectorScorerBenchmark.floatDotProductDefault 255 256 thrpt
15 23.212 ± 0.035 ops/us
VectorScorerBenchmark.floatDotProductDefault 256 256 thrpt
15 23.239 ± 0.117 ops/us
VectorScorerBenchmark.floatDotProductMemSeg 0 256 thrpt
15 53.514 ± 5.159 ops/us
VectorScorerBenchmark.floatDotProductMemSeg 1 256 thrpt
15 49.594 ± 3.885 ops/us
VectorScorerBenchmark.floatDotProductMemSeg 2 256 thrpt
15 50.504 ± 0.122 ops/us
VectorScorerBenchmark.floatDotProductMemSeg 4 256 thrpt
15 51.385 ± 4.406 ops/us
VectorScorerBenchmark.floatDotProductMemSeg 6 256 thrpt
15 50.497 ± 0.146 ops/us
VectorScorerBenchmark.floatDotProductMemSeg 8 256 thrpt
15 52.327 ± 0.292 ops/us
VectorScorerBenchmark.floatDotProductMemSeg 16 256 thrpt
15 51.401 ± 4.426 ops/us
VectorScorerBenchmark.floatDotProductMemSeg 20 256 thrpt
15 52.373 ± 0.307 ops/us
VectorScorerBenchmark.floatDotProductMemSeg 32 256 thrpt
15 54.779 ± 0.078 ops/us
VectorScorerBenchmark.floatDotProductMemSeg 50 256 thrpt
15 49.447 ± 1.502 ops/us
VectorScorerBenchmark.floatDotProductMemSeg 64 256 thrpt
15 54.788 ± 0.060 ops/us
VectorScorerBenchmark.floatDotProductMemSeg 100 256 thrpt
15 51.600 ± 0.352 ops/us
VectorScorerBenchmark.floatDotProductMemSeg 128 256 thrpt
15 54.650 ± 0.377 ops/us
VectorScorerBenchmark.floatDotProductMemSeg 255 256 thrpt
15 50.042 ± 0.167 ops/us
VectorScorerBenchmark.floatDotProductMemSeg 256 256 thrpt
15 54.583 ± 0.399 ops/us
```
There might be small some mis-alignment penalty for float SIMD?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]