alexdegroot commented on issue #45559:
URL: https://github.com/apache/arrow/issues/45559#issuecomment-2721398755

   @CurtHagenlocher I've verified the difference [using this 
test](https://gist.github.com/alexdegroot/0b431543f8335a483e01957e6df35a60) and 
the improvement is neglectable(see table below) unless you go for the one 
without shift operator. In that case the improvements are significant(±30%).
   
   I can imagine that the implementation using shift operator can be 
interesting once well tested. All we need to look at then is how to handle the 
breaking of the contract.
   
   ```
   
   BenchmarkDotNet v0.14.0, Ubuntu 20.04.6 LTS (Focal Fossa) (container)
   AMD EPYC 7763, 1 CPU, 8 logical and 4 physical cores
   .NET SDK 8.0.404
     [Host]     : .NET 8.0.11 (8.0.1124.51707), X64 RyuJIT AVX2
     DefaultJob : .NET 8.0.11 (8.0.1124.51707), X64 RyuJIT AVX2
   
   
   ```
   | Method                                | data    | index | Mean      | 
Error     | StdDev    |
   |-------------------------------------- |-------- |------ 
|----------:|----------:|----------:|
   | **OriginalImplementation**                | **Byte[4]** | **0**     | 
**0.9634 ns** | **0.0393 ns** | **0.0348 ns** |
   | OriginalImplementation                | Byte[4] | 0     | 1.0225 ns | 
0.0238 ns | 0.0223 ns |
   | OriginalImplementation                | Byte[4] | 0     | 0.9715 ns | 
0.0247 ns | 0.0231 ns |
   | OriginalImplementation                | Byte[4] | 0     | 0.9581 ns | 
0.0113 ns | 0.0101 ns |
   | OptimizedImplementation               | Byte[4] | 0     | 0.9651 ns | 
0.0208 ns | 0.0195 ns |
   | OptimizedImplementation               | Byte[4] | 0     | 0.9719 ns | 
0.0281 ns | 0.0249 ns |
   | OptimizedImplementation               | Byte[4] | 0     | 0.9591 ns | 
0.0143 ns | 0.0133 ns |
   | OptimizedImplementation               | Byte[4] | 0     | 0.9717 ns | 
0.0265 ns | 0.0248 ns |
   | OptimizedImplementationWithoutBitmask | Byte[4] | 0     | 0.6464 ns | 
0.0405 ns | 0.0378 ns |
   | OptimizedImplementationWithoutBitmask | Byte[4] | 0     | 0.6407 ns | 
0.0091 ns | 0.0081 ns |
   | OptimizedImplementationWithoutBitmask | Byte[4] | 0     | 0.6314 ns | 
0.0121 ns | 0.0101 ns |
   | OptimizedImplementationWithoutBitmask | Byte[4] | 0     | 0.6593 ns | 
0.0289 ns | 0.0270 ns |
   | **OriginalImplementation**                | **Byte[4]** | **1**     | 
**0.9353 ns** | **0.0098 ns** | **0.0087 ns** |
   | OriginalImplementation                | Byte[4] | 1     | 0.9389 ns | 
0.0127 ns | 0.0106 ns |
   | OriginalImplementation                | Byte[4] | 1     | 0.9711 ns | 
0.0244 ns | 0.0228 ns |
   | OriginalImplementation                | Byte[4] | 1     | 0.9653 ns | 
0.0332 ns | 0.0311 ns |
   | OptimizedImplementation               | Byte[4] | 1     | 0.9448 ns | 
0.0294 ns | 0.0261 ns |
   | OptimizedImplementation               | Byte[4] | 1     | 0.9390 ns | 
0.0288 ns | 0.0255 ns |
   | OptimizedImplementation               | Byte[4] | 1     | 0.9697 ns | 
0.0190 ns | 0.0169 ns |
   | OptimizedImplementation               | Byte[4] | 1     | 0.9809 ns | 
0.0382 ns | 0.0357 ns |
   | OptimizedImplementationWithoutBitmask | Byte[4] | 1     | 0.6363 ns | 
0.0274 ns | 0.0256 ns |
   | OptimizedImplementationWithoutBitmask | Byte[4] | 1     | 0.6478 ns | 
0.0218 ns | 0.0204 ns |
   | OptimizedImplementationWithoutBitmask | Byte[4] | 1     | 0.6282 ns | 
0.0174 ns | 0.0154 ns |
   | OptimizedImplementationWithoutBitmask | Byte[4] | 1     | 0.6276 ns | 
0.0098 ns | 0.0076 ns |
   | **OriginalImplementation**                | **Byte[4]** | **2**     | 
**0.9612 ns** | **0.0211 ns** | **0.0176 ns** |
   | OriginalImplementation                | Byte[4] | 2     | 0.9316 ns | 
0.0108 ns | 0.0084 ns |
   | OriginalImplementation                | Byte[4] | 2     | 0.9608 ns | 
0.0211 ns | 0.0187 ns |
   | OriginalImplementation                | Byte[4] | 2     | 0.9621 ns | 
0.0293 ns | 0.0274 ns |
   | OptimizedImplementation               | Byte[4] | 2     | 0.9457 ns | 
0.0247 ns | 0.0219 ns |
   | OptimizedImplementation               | Byte[4] | 2     | 0.9586 ns | 
0.0201 ns | 0.0178 ns |
   | OptimizedImplementation               | Byte[4] | 2     | 0.9445 ns | 
0.0107 ns | 0.0090 ns |
   | OptimizedImplementation               | Byte[4] | 2     | 0.9643 ns | 
0.0267 ns | 0.0237 ns |
   | OptimizedImplementationWithoutBitmask | Byte[4] | 2     | 0.6373 ns | 
0.0261 ns | 0.0244 ns |
   | OptimizedImplementationWithoutBitmask | Byte[4] | 2     | 0.6471 ns | 
0.0245 ns | 0.0229 ns |
   | OptimizedImplementationWithoutBitmask | Byte[4] | 2     | 0.6721 ns | 
0.0379 ns | 0.0354 ns |
   | OptimizedImplementationWithoutBitmask | Byte[4] | 2     | 0.6404 ns | 
0.0170 ns | 0.0151 ns |
   | **OriginalImplementation**                | **Byte[4]** | **3**     | 
**0.9693 ns** | **0.0269 ns** | **0.0238 ns** |
   | OriginalImplementation                | Byte[4] | 3     | 0.9532 ns | 
0.0246 ns | 0.0230 ns |
   | OriginalImplementation                | Byte[4] | 3     | 0.9267 ns | 
0.0152 ns | 0.0134 ns |
   | OriginalImplementation                | Byte[4] | 3     | 0.9579 ns | 
0.0380 ns | 0.0355 ns |
   | OptimizedImplementation               | Byte[4] | 3     | 0.9399 ns | 
0.0158 ns | 0.0140 ns |
   | OptimizedImplementation               | Byte[4] | 3     | 0.9556 ns | 
0.0108 ns | 0.0090 ns |
   | OptimizedImplementation               | Byte[4] | 3     | 0.9447 ns | 
0.0127 ns | 0.0113 ns |
   | OptimizedImplementation               | Byte[4] | 3     | 0.9561 ns | 
0.0375 ns | 0.0351 ns |
   | OptimizedImplementationWithoutBitmask | Byte[4] | 3     | 0.6315 ns | 
0.0142 ns | 0.0119 ns |
   | OptimizedImplementationWithoutBitmask | Byte[4] | 3     | 0.6176 ns | 
0.0031 ns | 0.0024 ns |
   | OptimizedImplementationWithoutBitmask | Byte[4] | 3     | 0.6213 ns | 
0.0016 ns | 0.0013 ns |
   | OptimizedImplementationWithoutBitmask | Byte[4] | 3     | 0.6195 ns | 
0.0112 ns | 0.0088 ns |
   | **OriginalImplementation**                | **Byte[4]** | **4**     | 
**0.9517 ns** | **0.0117 ns** | **0.0091 ns** |
   | OriginalImplementation                | Byte[4] | 4     | 0.9563 ns | 
0.0297 ns | 0.0278 ns |
   | OriginalImplementation                | Byte[4] | 4     | 0.9429 ns | 
0.0254 ns | 0.0226 ns |
   | OriginalImplementation                | Byte[4] | 4     | 0.9541 ns | 
0.0204 ns | 0.0181 ns |
   | OptimizedImplementation               | Byte[4] | 4     | 0.9617 ns | 
0.0165 ns | 0.0129 ns |
   | OptimizedImplementation               | Byte[4] | 4     | 0.9408 ns | 
0.0040 ns | 0.0031 ns |
   | OptimizedImplementation               | Byte[4] | 4     | 0.9623 ns | 
0.0179 ns | 0.0158 ns |
   | OptimizedImplementation               | Byte[4] | 4     | 0.9376 ns | 
0.0122 ns | 0.0114 ns |
   | OptimizedImplementationWithoutBitmask | Byte[4] | 4     | 0.6645 ns | 
0.0349 ns | 0.0326 ns |
   | OptimizedImplementationWithoutBitmask | Byte[4] | 4     | 0.6279 ns | 
0.0221 ns | 0.0207 ns |
   | OptimizedImplementationWithoutBitmask | Byte[4] | 4     | 0.6446 ns | 
0.0155 ns | 0.0138 ns |
   | OptimizedImplementationWithoutBitmask | Byte[4] | 4     | 0.6564 ns | 
0.0237 ns | 0.0222 ns |
   | **OriginalImplementation**                | **Byte[4]** | **5**     | 
**0.9396 ns** | **0.0126 ns** | **0.0105 ns** |
   | OriginalImplementation                | Byte[4] | 5     | 0.9703 ns | 
0.0304 ns | 0.0284 ns |
   | OriginalImplementation                | Byte[4] | 5     | 0.9645 ns | 
0.0394 ns | 0.0349 ns |
   | OriginalImplementation                | Byte[4] | 5     | 0.9678 ns | 
0.0340 ns | 0.0318 ns |
   | OptimizedImplementation               | Byte[4] | 5     | 0.9644 ns | 
0.0303 ns | 0.0283 ns |
   | OptimizedImplementation               | Byte[4] | 5     | 0.9495 ns | 
0.0116 ns | 0.0097 ns |
   | OptimizedImplementation               | Byte[4] | 5     | 0.9588 ns | 
0.0119 ns | 0.0105 ns |
   | OptimizedImplementation               | Byte[4] | 5     | 0.9626 ns | 
0.0331 ns | 0.0310 ns |
   | OptimizedImplementationWithoutBitmask | Byte[4] | 5     | 0.6488 ns | 
0.0301 ns | 0.0267 ns |
   | OptimizedImplementationWithoutBitmask | Byte[4] | 5     | 0.6295 ns | 
0.0292 ns | 0.0273 ns |
   | OptimizedImplementationWithoutBitmask | Byte[4] | 5     | 0.6530 ns | 
0.0270 ns | 0.0252 ns |
   | OptimizedImplementationWithoutBitmask | Byte[4] | 5     | 0.6219 ns | 
0.0213 ns | 0.0189 ns |
   | **OriginalImplementation**                | **Byte[4]** | **6**     | 
**0.9454 ns** | **0.0251 ns** | **0.0222 ns** |
   | OriginalImplementation                | Byte[4] | 6     | 0.9430 ns | 
0.0325 ns | 0.0304 ns |
   | OriginalImplementation                | Byte[4] | 6     | 0.9730 ns | 
0.0303 ns | 0.0269 ns |
   | OriginalImplementation                | Byte[4] | 6     | 0.9712 ns | 
0.0295 ns | 0.0275 ns |
   | OptimizedImplementation               | Byte[4] | 6     | 0.9485 ns | 
0.0371 ns | 0.0347 ns |
   | OptimizedImplementation               | Byte[4] | 6     | 0.9520 ns | 
0.0138 ns | 0.0115 ns |
   | OptimizedImplementation               | Byte[4] | 6     | 0.9588 ns | 
0.0172 ns | 0.0152 ns |
   | OptimizedImplementation               | Byte[4] | 6     | 0.9685 ns | 
0.0307 ns | 0.0288 ns |
   | OptimizedImplementationWithoutBitmask | Byte[4] | 6     | 0.6395 ns | 
0.0257 ns | 0.0228 ns |
   | OptimizedImplementationWithoutBitmask | Byte[4] | 6     | 0.6674 ns | 
0.0386 ns | 0.0361 ns |
   | OptimizedImplementationWithoutBitmask | Byte[4] | 6     | 0.6202 ns | 
0.0227 ns | 0.0189 ns |
   | OptimizedImplementationWithoutBitmask | Byte[4] | 6     | 0.6178 ns | 
0.0146 ns | 0.0122 ns |
   | **OriginalImplementation**                | **Byte[4]** | **7**     | 
**0.9574 ns** | **0.0361 ns** | **0.0320 ns** |
   | OriginalImplementation                | Byte[4] | 7     | 0.9567 ns | 
0.0294 ns | 0.0275 ns |
   | OriginalImplementation                | Byte[4] | 7     | 0.9446 ns | 
0.0140 ns | 0.0124 ns |
   | OriginalImplementation                | Byte[4] | 7     | 0.9468 ns | 
0.0160 ns | 0.0150 ns |
   | OptimizedImplementation               | Byte[4] | 7     | 0.9792 ns | 
0.0398 ns | 0.0333 ns |
   | OptimizedImplementation               | Byte[4] | 7     | 0.9617 ns | 
0.0262 ns | 0.0233 ns |
   | OptimizedImplementation               | Byte[4] | 7     | 0.9654 ns | 
0.0328 ns | 0.0307 ns |
   | OptimizedImplementation               | Byte[4] | 7     | 1.0226 ns | 
0.0559 ns | 0.0644 ns |
   | OptimizedImplementationWithoutBitmask | Byte[4] | 7     | 0.6351 ns | 
0.0190 ns | 0.0168 ns |
   | OptimizedImplementationWithoutBitmask | Byte[4] | 7     | 0.6559 ns | 
0.0346 ns | 0.0307 ns |
   | OptimizedImplementationWithoutBitmask | Byte[4] | 7     | 0.6266 ns | 
0.0267 ns | 0.0249 ns |
   | OptimizedImplementationWithoutBitmask | Byte[4] | 7     | 0.6339 ns | 
0.0152 ns | 0.0135 ns |
   | **OriginalImplementation**                | **Byte[4]** | **8**     | 
**0.9534 ns** | **0.0381 ns** | **0.0338 ns** |
   | OriginalImplementation                | Byte[4] | 8     | 0.9479 ns | 
0.0137 ns | 0.0121 ns |
   | OriginalImplementation                | Byte[4] | 8     | 0.9572 ns | 
0.0218 ns | 0.0193 ns |
   | OriginalImplementation                | Byte[4] | 8     | 0.9467 ns | 
0.0283 ns | 0.0264 ns |
   | OptimizedImplementation               | Byte[4] | 8     | 0.9570 ns | 
0.0117 ns | 0.0104 ns |
   | OptimizedImplementation               | Byte[4] | 8     | 0.9389 ns | 
0.0120 ns | 0.0094 ns |
   | OptimizedImplementation               | Byte[4] | 8     | 0.9674 ns | 
0.0269 ns | 0.0238 ns |
   | OptimizedImplementation               | Byte[4] | 8     | 0.9719 ns | 
0.0272 ns | 0.0255 ns |
   | OptimizedImplementationWithoutBitmask | Byte[4] | 8     | 0.6073 ns | 
0.0177 ns | 0.0147 ns |
   | OptimizedImplementationWithoutBitmask | Byte[4] | 8     | 0.6418 ns | 
0.0307 ns | 0.0287 ns |
   | OptimizedImplementationWithoutBitmask | Byte[4] | 8     | 0.6082 ns | 
0.0090 ns | 0.0070 ns |
   | OptimizedImplementationWithoutBitmask | Byte[4] | 8     | 0.6678 ns | 
0.0262 ns | 0.0232 ns |
   | **OriginalImplementation**                | **Byte[4]** | **15**    | 
**0.9646 ns** | **0.0303 ns** | **0.0283 ns** |
   | OriginalImplementation                | Byte[4] | 15    | 0.9766 ns | 
0.0294 ns | 0.0275 ns |
   | OriginalImplementation                | Byte[4] | 15    | 0.9556 ns | 
0.0270 ns | 0.0253 ns |
   | OriginalImplementation                | Byte[4] | 15    | 0.9633 ns | 
0.0358 ns | 0.0334 ns |
   | OptimizedImplementation               | Byte[4] | 15    | 0.9718 ns | 
0.0255 ns | 0.0226 ns |
   | OptimizedImplementation               | Byte[4] | 15    | 0.9550 ns | 
0.0255 ns | 0.0239 ns |
   | OptimizedImplementation               | Byte[4] | 15    | 0.9432 ns | 
0.0134 ns | 0.0105 ns |
   | OptimizedImplementation               | Byte[4] | 15    | 0.9680 ns | 
0.0343 ns | 0.0304 ns |
   | OptimizedImplementationWithoutBitmask | Byte[4] | 15    | 0.6284 ns | 
0.0125 ns | 0.0105 ns |
   | OptimizedImplementationWithoutBitmask | Byte[4] | 15    | 0.6374 ns | 
0.0215 ns | 0.0191 ns |
   | OptimizedImplementationWithoutBitmask | Byte[4] | 15    | 0.6784 ns | 
0.0491 ns | 0.0459 ns |
   | OptimizedImplementationWithoutBitmask | Byte[4] | 15    | 0.6479 ns | 
0.0286 ns | 0.0267 ns |
   | **OriginalImplementation**                | **Byte[4]** | **16**    | 
**0.9385 ns** | **0.0285 ns** | **0.0267 ns** |
   | OriginalImplementation                | Byte[4] | 16    | 0.9602 ns | 
0.0260 ns | 0.0243 ns |
   | OriginalImplementation                | Byte[4] | 16    | 0.9619 ns | 
0.0118 ns | 0.0099 ns |
   | OriginalImplementation                | Byte[4] | 16    | 0.9476 ns | 
0.0334 ns | 0.0312 ns |
   | OptimizedImplementation               | Byte[4] | 16    | 0.9368 ns | 
0.0199 ns | 0.0186 ns |
   | OptimizedImplementation               | Byte[4] | 16    | 0.9760 ns | 
0.0277 ns | 0.0259 ns |
   | OptimizedImplementation               | Byte[4] | 16    | 0.9364 ns | 
0.0231 ns | 0.0205 ns |
   | OptimizedImplementation               | Byte[4] | 16    | 0.9792 ns | 
0.0300 ns | 0.0281 ns |
   | OptimizedImplementationWithoutBitmask | Byte[4] | 16    | 0.6423 ns | 
0.0326 ns | 0.0305 ns |
   | OptimizedImplementationWithoutBitmask | Byte[4] | 16    | 0.6206 ns | 
0.0258 ns | 0.0216 ns |
   | OptimizedImplementationWithoutBitmask | Byte[4] | 16    | 0.6500 ns | 
0.0327 ns | 0.0306 ns |
   | OptimizedImplementationWithoutBitmask | Byte[4] | 16    | 0.6566 ns | 
0.0253 ns | 0.0224 ns |
   | **OriginalImplementation**                | **Byte[4]** | **23**    | 
**0.9637 ns** | **0.0147 ns** | **0.0131 ns** |
   | OriginalImplementation                | Byte[4] | 23    | 0.9735 ns | 
0.0246 ns | 0.0206 ns |
   | OriginalImplementation                | Byte[4] | 23    | 0.9370 ns | 
0.0241 ns | 0.0214 ns |
   | OriginalImplementation                | Byte[4] | 23    | 0.9686 ns | 
0.0350 ns | 0.0327 ns |
   | OptimizedImplementation               | Byte[4] | 23    | 0.9448 ns | 
0.0240 ns | 0.0224 ns |
   | OptimizedImplementation               | Byte[4] | 23    | 0.9803 ns | 
0.0300 ns | 0.0280 ns |
   | OptimizedImplementation               | Byte[4] | 23    | 0.9618 ns | 
0.0275 ns | 0.0244 ns |
   | OptimizedImplementation               | Byte[4] | 23    | 0.9719 ns | 
0.0347 ns | 0.0325 ns |
   | OptimizedImplementationWithoutBitmask | Byte[4] | 23    | 0.6637 ns | 
0.0207 ns | 0.0173 ns |
   | OptimizedImplementationWithoutBitmask | Byte[4] | 23    | 0.6433 ns | 
0.0311 ns | 0.0276 ns |
   | OptimizedImplementationWithoutBitmask | Byte[4] | 23    | 0.6435 ns | 
0.0259 ns | 0.0242 ns |
   | OptimizedImplementationWithoutBitmask | Byte[4] | 23    | 0.6406 ns | 
0.0248 ns | 0.0232 ns |
   | **OriginalImplementation**                | **Byte[4]** | **24**    | 
**0.9558 ns** | **0.0332 ns** | **0.0311 ns** |
   | OriginalImplementation                | Byte[4] | 24    | 2.9375 ns | 
0.0118 ns | 0.0110 ns |
   | OriginalImplementation                | Byte[4] | 24    | 1.0276 ns | 
0.0519 ns | 0.0710 ns |
   | OriginalImplementation                | Byte[4] | 24    | 0.9618 ns | 
0.0252 ns | 0.0235 ns |
   | OptimizedImplementation               | Byte[4] | 24    | 0.9715 ns | 
0.0241 ns | 0.0214 ns |
   | OptimizedImplementation               | Byte[4] | 24    | 0.9552 ns | 
0.0337 ns | 0.0315 ns |
   | OptimizedImplementation               | Byte[4] | 24    | 0.9812 ns | 
0.0299 ns | 0.0265 ns |
   | OptimizedImplementation               | Byte[4] | 24    | 0.9626 ns | 
0.0476 ns | 0.0445 ns |
   | OptimizedImplementationWithoutBitmask | Byte[4] | 24    | 0.6424 ns | 
0.0222 ns | 0.0208 ns |
   | OptimizedImplementationWithoutBitmask | Byte[4] | 24    | 0.3069 ns | 
0.0375 ns | 0.0351 ns |
   | OptimizedImplementationWithoutBitmask | Byte[4] | 24    | 0.6667 ns | 
0.0271 ns | 0.0254 ns |
   | OptimizedImplementationWithoutBitmask | Byte[4] | 24    | 0.6397 ns | 
0.0221 ns | 0.0207 ns |
   | **OriginalImplementation**                | **Byte[4]** | **31**    | 
**0.9554 ns** | **0.0088 ns** | **0.0073 ns** |
   | OriginalImplementation                | Byte[4] | 31    | 0.9743 ns | 
0.0231 ns | 0.0216 ns |
   | OriginalImplementation                | Byte[4] | 31    | 0.9627 ns | 
0.0400 ns | 0.0374 ns |
   | OriginalImplementation                | Byte[4] | 31    | 0.9622 ns | 
0.0239 ns | 0.0212 ns |
   | OptimizedImplementation               | Byte[4] | 31    | 0.9373 ns | 
0.0258 ns | 0.0241 ns |
   | OptimizedImplementation               | Byte[4] | 31    | 0.9444 ns | 
0.0249 ns | 0.0233 ns |
   | OptimizedImplementation               | Byte[4] | 31    | 0.9715 ns | 
0.0291 ns | 0.0272 ns |
   | OptimizedImplementation               | Byte[4] | 31    | 0.9542 ns | 
0.0282 ns | 0.0264 ns |
   | OptimizedImplementationWithoutBitmask | Byte[4] | 31    | 0.6494 ns | 
0.0147 ns | 0.0122 ns |
   | OptimizedImplementationWithoutBitmask | Byte[4] | 31    | 0.6373 ns | 
0.0202 ns | 0.0179 ns |
   | OptimizedImplementationWithoutBitmask | Byte[4] | 31    | 0.6547 ns | 
0.0243 ns | 0.0227 ns |
   | OptimizedImplementationWithoutBitmask | Byte[4] | 31    | 0.6624 ns | 
0.0269 ns | 0.0252 ns |


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to