Re: [PR] [CI][Windows] Workaround for error in Findzstd.cmake [tvm]

via GitHub Thu, 22 Aug 2024 08:36:08 -0700


Lunderberg commented on PR #17283:
URL: https://github.com/apache/tvm/pull/17283#issuecomment-2305055694

I think I've tracked down the problem. Writing down the steps to record it,
and to collect all the links in one spot.

1. The `__truncsfhf2` and `__extendhfsf2` are builtins provided by LLVM.
Calls to these builtins are generated from the LLVM IR instructions `fptrunc`
and `fpext`.
2. In LLVM 15, the ABI of these builtins was changed from accepting `uint16`
arguments to accepting `_Float16` arguments (see [this
thread](https://github.com/llvm/llvm-project/issues/56854). As a result,
TVM-generated code that was compiled under LLVM 15 would be incompatible with
the LLVM 14 runtime.
3. As a result of (2), TVM
[PR#12877](https://github.com/apache/tvm/pull/12877) would inject local
overrides of `__truncsfhf2` and `__extendhfsf2`. That way, when LLVM lowers
`fpext` to `__extendhfsf2`, it uses our local override of `__extendhfsf2`. The
choice of which `__extendhfsf2` is based on whether the target supports SSE2,
matching the decision made by LLVM.
4. After ([this
commit](https://github.com/llvm/llvm-project/commit/489bda6a9c0ec9d2644b7bb0c230294d38f7296e)),
LLVM performs a per-architecture check to determine whether the compiler
supports the use of `_Float16` use of float16. The first release containing
the commit was LLVM 17.

However, this commit also switches from using the LLVM
`builtin_check_c_compiler_source` to the cmake `check_c_source_compiles`. The
former only attempts to compile a string ([source
link](https://github.com/llvm/llvm-project/blob/main/compiler-rt/cmake/Modules/BuiltinTests.cmake#L3)),
while the latter attempts to compile and link the string ([doc
link](https://cmake.org/cmake/help/latest/module/CheckCSourceCompiles.html)).
If I understand correctly, since the string doesn't define `int main`, this
check would always return false.

This looks like a bug in the upstream LLVM implementation, but the
relevance here is that it would mean that LLVM 17 would produce calls to the
`uint32` ABI, while our replacement function would expect calls with the
`_Float16` ABI.

5. In June, the Windows CI runners updated from using LLVM 16 to LLVM 18
([link](https://github.com/actions/runner-images/issues/10001)). The timing of
it is a point against this hypothesis, since the ~2 month gap between when the
new version of LLVM was rolled out in Github and when the issue started
occurring in TVM. It's possible that that was just their gradual rollout, but
that would be a stretch.

To test whether there's an incompatibility between the ABI expected by LLVM,
and the ABI that we provide, I've added more debug statements and commented out
the call to `EmitFloat16ConversionBuiltins`. If commenting out that line
avoids the issue on Windows, then that would at least point to some sort of
incompatibility.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [CI][Windows] Workaround for error in Findzstd.cmake [tvm]

Reply via email to