[
https://issues.apache.org/jira/browse/ARROW-14838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504626#comment-17504626
]
Antoine Pitrou commented on ARROW-14838:
----------------------------------------
The path that checks CPU flags on Linux is here:
https://github.com/apache/arrow/blob/d056829e877cdbbe071ea3fb34bd0b9ad42145e6/cpp/src/arrow/util/cpu_info.cc#L386
Granted, that file is a mess :-(
> [C++] Fix issue with valgrind unrecognized instruction
> ------------------------------------------------------
>
> Key: ARROW-14838
> URL: https://issues.apache.org/jira/browse/ARROW-14838
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++
> Reporter: Rob Ambalu
> Priority: Major
>
> This is the same issue as ARROW-9851, but I dont believe the correct fix was
> implemented.
> The merged fix lets you control if AVX512 is used by a compile flag, but that
> is sub-optimal because:
> a) It forces you to avoid avx512 instructions solely in order to be able to
> run under valgrind
> b) You dont always have control of the compile flags, ie if you are pip
> installing pyarrow
>
> I actually reached out to the valgrind community and they insisted that
> pyarrow is at fault for not checking the CPU feature flags. Apparently
> valgrind will instrument the CPU feature flags when you run under valgrind,
> and it would should AVX512 is not supported, and so pyarrow should avoid
> using it ( I assume arrow would hit a similar issue if run on an actual CPU
> without AVX512 support ).
> Please look into dynamically checking the CPU flags to avoid this issue and
> it has become a major issue for us, we cant valgrind our processes anymore
> now that we import arrow / pyarrow in many places.
>
> valgrind error:
> This is the error from valgrind:
>
> {noformat}
> vex amd64->IR: unhandled instruction bytes: 0x62 0xF1 0xFD 0x8 0x6F 0x44 0x24
> 0xA 0xC7 0x43
> vex amd64->IR: REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
> vex amd64->IR: VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE
> vex amd64->IR: PFX.66=0 PFX.F2=0 PFX.F3=0
> ==32035== valgrind: Unrecognised instruction at address 0x1085ea85.
> ==32035== at 0x1085EA85:
> arrow::compute::internal::RegisterScalarAggregateSumAvx512(arrow::compute::FunctionRegistry*)
> (in
> /home/ra7293/user/hfalgo_csp/.venvs/PY36/build_tools_venv/lib/libarrow.so.100.1.0)
> ==32035== by 0x1067967C: arrow::compute::GetFunctionRegistry() (in
> /home/ra7293/user/hfalgo_csp/.venvs/PY36/build_tools_venv/lib/libarrow.so.100.1.0)
> {noformat}
>
>
> response from valgrind dev community:
>
> {noformat}
> > 0x62 0xF1 0xFD 0x8 0x6F 0x44 0x24 0xA 0xC7 0x43
>
> This is vmovdqa64 0xa0(%rsp),%xmm0 which requires CPU feature flag
> AVX512VL/AVX512F which is not supported by valgrind 3.18.1.
>
> In the results of the emulated CPUID instruction, valgrind tells
> the running application that AVX512 is not supported.
> It is a bug in pyarrow ( python ) that its initialization routine
> does not check the cpu feature flags, and switch its later run-time
> instructions appropriately. Please report this bug to the maintainers
> of pyarrow. Modern run-time libraries use the STT_GNU_IFUNC feature
> to choose at run time which instruction set is available and may be used.
> pyarrow (and/or python3 itself) must upgrade.
>
> Gcc supports compile-time feature flags to avoid generating code
> that uses certain instructions. Run "info gcc" and search for "avx512",
> paying attention to the flags beginning with "-mavx512", and in particular
> the "-mno-avx512" variants.
>
> Some software administration systems for installing and maintaining
> software packages can limit the hardware features that packages
> may assume. You may be able to work-around valgrind's non-support
> for AVX512 by telling the packaging system to avoid the package
> variants which require AVX512.
> {noformat}
> {noformat}
> *no* further _formatting_ is done here{noformat}
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)