westonpace commented on PR #12928:
URL: https://github.com/apache/arrow/pull/12928#issuecomment-1105955726

   I played around with this a bit more.  I can reproduce it locally by 
building with SSE4_2:
   
   ```
   cmake .. -DARROW_PARQUET=ON -DARROW_SIMD_LEVEL=SSE4_2 
-DARROW_RUNTIME_SIMD_LEVEL=MAX -DARROW_BUILD_TESTS=ON
   ```
   
   From there it's easiest to verify by just manually checking to see which 
version ends up in `libparquet.so`:
   
   ```
   objdump 
--disassemble=_ZN5arrow8internal21FirstTimeBitmapWriter10AppendWordEml -S 
./minsizerel/libparquet.so.800.0.0
   ```
   If the output contains `shlx` then you've reproduced the bug.  If it only 
contains `shl` then it picked the correct default symbol.  If the method is 
entirely inlined you get no output.
   
   * The symbol is inlined with `-DCMAKE_BUILD_TYPE=Release`
   * The symbol is not inlined with `-DCMAKE_BUILD_TYPE=MinSizeRel`
     * However, on my system, in all cases, the `libparquet.so` file chooses 
the correct version unless...
     * I can get an invalid `.so` file if I switch the order the object files 
are passed to the linker: `/usr/bin/clang++-13 ... level_conversion_bmi2.cc.o 
... level_conversion.cc.o ...`
   * The symbol is inlined, even with `MinSizeRel` is I try @kou's fix 
(`__attribute__((always_inline))`).
     * This seems like the easiest "spot fix" if we wanted to include something 
as part of 8.0.0
   
   If you really want to reproduce the issue, I found a tool 
[sde64](https://www.intel.com/content/www/us/en/developer/articles/tool/software-development-emulator.html)
 which will work if you have an Intel processor.  It allows you to simulate 
older Intel processors and so you can pretend to have an Ivy Bridge processor 
(which does not have AVX2/BMI2 support):
   
   ```
   sde64 -ivb -- ./minsizerel/parquet-arrow-test 
--gtest_filter=TestParquetIO/0.SingleNullableListNullableColumnReadWrite
   Running main() from ../googletest/src/gtest_main.cc
   Note: Google Test filter = 
TestParquetIO/0.SingleNullableListNullableColumnReadWrite
   [==========] Running 1 test from 1 test suite.
   [----------] Global test environment set-up.
   [----------] 1 test from TestParquetIO/0, where TypeParam = 
arrow::BooleanType
   [ RUN      ] TestParquetIO/0.SingleNullableListNullableColumnReadWrite
   TID 0 SDE-ERROR: Executed instruction not valid for specified chip 
(IVYBRIDGE): 0x7f80c95b96b3: shlx rax, rbx, rax
   Image: 
/home/pace/dev/arrow/cpp/sse4.2-min-build/minsizerel/libparquet.so.800+0x1666b3
   Function: _ZN5arrow8internal21FirstTimeBitmapWriter10AppendWordEml
   Instruction bytes are: c4 e2 f9 f7 c3 
   ```
   
   @pitrou has also posted a suggestion on the ML using pragmas.  I had to 
include the arch specifier so it generated 
`__attribute__((target("arch=haswell,avx2")))` and this appears to do what is 
expected.  The function is compiled with avx2 but the nested call is not.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to