[llvm-bugs] [Bug 71223] [CUDA][HIP] fails to compile __int128 division

LLVM Bugs via llvm-bugs Fri, 03 Nov 2023 12:50:45 -0700

Issue	71223
Summary	[CUDA][HIP] fails to compile __int128 division
Labels	new issue
Assignees
Reporter	yxsamliu

    Currently clang is able to lower `__int128` add/subtract/multiply operations in nvptx and amgpu. However, it lowers `__int128` division to compiler-rt lib call `__divti3`. Currently compiler-rt does not supports nvptx or amgpu target. Even if it does, amdgpu backend does not support ISA level linking, therefore is unable to link compiler-rt after LLVM codegen.


failure on amdgpu: https://godbolt.org/z/4oqPoYGG9

failure on nvptx: https://godbolt.org/z/411M3x4Eh

`__int128` division on x86_4 showing lowering to `__divti3` https://godbolt.org/z/b793fE7E5

`__int128` division with nvcc: https://godbolt.org/z/7WaM7vG9j

compiler-rt implementation of 128 bit integer division: https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/builtins/int_div_impl.inc

Ideally, nvptx and amdgpu backend should support ISA level linking and compiler-rt. However, that might take some time.

Another option is to let llvm lower `__int128` division to instructions instead of libcall (https://github.com/llvm/llvm-project/blob/main/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp#L4398). However, this may not worth the effort.

Another option is to implement `__divti3` as a inline function in the default clang header for CUDA/HIP. If `__int128` division is found in device code, mark it as used. This seems to be a feasible solution.

@Artem-B

_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

[llvm-bugs] [Bug 71223] [CUDA][HIP] fails to compile __int128 division

Reply via email to