westonpace commented on PR #12928:
URL: https://github.com/apache/arrow/pull/12928#issuecomment-1105955726
I played around with this a bit more. I can reproduce it locally by
building with SSE4_2:
```
cmake .. -DARROW_PARQUET=ON -DARROW_SIMD_LEVEL=SSE4_2
-DARROW_RUNTIME_SIMD_LEVEL=MAX -DARROW_BUILD_TESTS=ON
```
From there it's easiest to verify by just manually checking to see which
version ends up in `libparquet.so`:
```
objdump
--disassemble=_ZN5arrow8internal21FirstTimeBitmapWriter10AppendWordEml -S
./minsizerel/libparquet.so.800.0.0
```
If the output contains `shlx` then you've reproduced the bug. If it only
contains `shl` then it picked the correct default symbol. If the method is
entirely inlined you get no output.
* The symbol is inlined with `-DCMAKE_BUILD_TYPE=Release`
* The symbol is not inlined with `-DCMAKE_BUILD_TYPE=MinSizeRel`
* However, on my system, in all cases, the `libparquet.so` file chooses
the correct version unless...
* I can get an invalid `.so` file if I switch the order the object files
are passed to the linker: `/usr/bin/clang++-13 ... level_conversion_bmi2.cc.o
... level_conversion.cc.o ...`
* The symbol is inlined, even with `MinSizeRel` is I try @kou's fix
(`__attribute__((always_inline))`).
* This seems like the easiest "spot fix" if we wanted to include something
as part of 8.0.0
If you really want to reproduce the issue, I found a tool
[sde64](https://www.intel.com/content/www/us/en/developer/articles/tool/software-development-emulator.html)
which will work if you have an Intel processor. It allows you to simulate
older Intel processors and so you can pretend to have an Ivy Bridge processor
(which does not have AVX2/BMI2 support):
```
sde64 -ivb -- ./minsizerel/parquet-arrow-test
--gtest_filter=TestParquetIO/0.SingleNullableListNullableColumnReadWrite
Running main() from ../googletest/src/gtest_main.cc
Note: Google Test filter =
TestParquetIO/0.SingleNullableListNullableColumnReadWrite
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from TestParquetIO/0, where TypeParam =
arrow::BooleanType
[ RUN ] TestParquetIO/0.SingleNullableListNullableColumnReadWrite
TID 0 SDE-ERROR: Executed instruction not valid for specified chip
(IVYBRIDGE): 0x7f80c95b96b3: shlx rax, rbx, rax
Image:
/home/pace/dev/arrow/cpp/sse4.2-min-build/minsizerel/libparquet.so.800+0x1666b3
Function: _ZN5arrow8internal21FirstTimeBitmapWriter10AppendWordEml
Instruction bytes are: c4 e2 f9 f7 c3
```
@pitrou has also posted a suggestion on the ML using pragmas. I had to
include the arch specifier so it generated
`__attribute__((target("arch=haswell,avx2")))` and this appears to do what is
expected. The function is compiled with avx2 but the nested call is not.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]