Hello all,

I posted a question on the forums several days back, but suspect that might
not be the right place to be asking what I'm asking, so I'm trying the
mailing list as well.

I'll just repost here what I put in the forums, but the link to that is
here: https://forum.kde.org/viewtopic.php?f=74&t=161199

I'm trying to build Eigen on Mac for CUDA (using the nvcc compiler), and
getting build errors. I understand the errors, and I have a change that
lets me dodge the build failures, but I suspect it's not the right change
for checkin, and so I'm looking for feedback.

So the issue I have is in Half.h. I wind up getting errors about a bunch of
operators being already defined. The core issue is that on Mac, nvcc (the
CUDA compliler) is using gcc as the host compiler, but gcc on Mac is built
on top of clang. Eigen seems to be implicitly assuming that the presence of
clang implies that absence of CUDA (or at least the absence of nvcc CUDA
support).

In my build I'm hitting this block:

#if (defined(EIGEN_HAS_CUDA_FP16) && defined(EIGEN_CUDA_ARCH) && \
     EIGEN_CUDA_ARCH >= 530) ||                                  \
    (defined(EIGEN_HAS_HIP_FP16) && defined(HIP_DEVICE_COMPILE))
#define EIGEN_HAS_NATIVE_FP16
#endif

which results in EIGEN_HAS_NATIVE_FP16 being set, and so we wind up
compiling in all the operators from Half.h:253-313. That's fine so far.

What happens next is we hit this line:

#if !defined(EIGEN_HAS_NATIVE_FP16) || EIGEN_COMP_CLANG // Emulate support
for half floats

which is followed shortly after by (roughly) the same operator functions
(but... emulated), and I get errors because those operator functions were
defined above.

So. My hack to work around this is to ensure that EIGEN_COMP_CLANG gets set
to 0 in Macros.h if __NVCC__ is defined. That works fine for me locally,
and gets Eigen building fine (and thus unblocks me on getting TensorFlow
building for Mac, or at least unblocks this issue).

I'm willing to bet however that this is the wrong thing to do in general. I
don't understand enough of what this second code block is doing to really
understand why clang is being treated differently than nvcc here (and
specifically why half support needs to be emulated in the presence of
clang). I believe there is a version of clang that supports CUDA (at least
on some platforms?). Presumably this is for that, but I don't know enough
about how that differs from nvcc to fully grok this.

Can anyone help enlighten me about the best way to fix this?

Thanks!
---
Eric Klein
[email protected]

Reply via email to