Re: [eigen] Mac CUDA build failure question

Eric Klein Thu, 20 Jun 2019 15:05:49 -0700

Ok. Looks like the warnings are there with and without my hack(s), and the
minimal set of edits needed to get Eigen to build on Mac with nvcc consists
of:


Half.h. Change this:
#if !defined(EIGEN_HAS_NATIVE_FP16) || EIGEN_COMP_CLANG // Emulate support
for half floats
to this:
#if !defined(EIGEN_HAS_NATIVE_FP16) || (EIGEN_COMP_CLANG &&
!EIGEN_COMP_NVCC) // Emulate support for half floats

And in PacketMath.h, change this:
#if defined(EIGEN_CUDA_ARCH) || defined(EIGEN_HIP_DEVICE_COMPILE) ||
(defined(EIGEN_CUDACC) && EIGEN_COMP_CLANG)
to this:
#if defined(EIGEN_CUDA_ARCH) || defined(EIGEN_HIP_DEVICE_COMPILE) ||
(defined(EIGEN_CUDACC) && EIGEN_COMP_CLANG && !EIGEN_COMP_NVCC)

Obviously that's excluding any enlightening comments about why that's being
done.

Would you like me to prepare a patch file, or is this something that would
be better handled by one of the regular contributors?

Thank you!
---
Eric Klein
[email protected]


On Thu, Jun 20, 2019 at 12:01 AM Eric Klein <[email protected]> wrote:

> That appears to work, although there are 2-3 other places that need
> similar modifications in order to work. I'll try to get you a more complete
> list tomorrow.
>
> I'm paying more attention tonight to warnings coming from Eigen than I had
> been previously ignoring, and both with my old Macros.h based hack and the
> newer modifications, I'm seeing some of these: "warning: calling a __host__
> function from a __host__ __device__ function is not allowed". A
> representative one is:
>
> external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorReduction.h(735):
> warning: calling a __host__ function from a __host__ __device__ function is
> not allowed
>           detected during:
>             instantiation of "__nv_bool Eigen::TensorEvaluator<const
> Eigen::TensorReductionOp<Op, Dims, ArgType, MakePointer_>,
> Device>::evalSubExprsIfNeeded(MakePointer_<Eigen::TensorEvaluator<const
> Eigen::TensorReductionOp<Op, Dims, ArgType, MakePointer_>,
> Device>::CoeffReturnType>::Type) [with
> Op=Eigen::internal::AvgPoolMeanReducer<double>, Dims=const
> Eigen::IndexList<Eigen::type2index<1L>, Eigen::type2index<2L>>,
> ArgType=const Eigen::TensorImagePatchOp<-1L, -1L, const
> Eigen::TensorLayoutSwapOp<const Eigen::TensorMap<Eigen::Tensor<const
> double, 4, 1, Eigen::DenseIndex>, 16, Eigen::MakePointer>>>,
> MakePointer_=Eigen::MakePointer, Device=Eigen::GpuDevice]"
> external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorMorphing.h(172):
> here
>             instantiation of "__nv_bool Eigen::TensorEvaluator<const
> Eigen::TensorReshapingOp<NewDimensions, ArgType>,
> Device>::evalSubExprsIfNeeded(Eigen::TensorEvaluator<const
> Eigen::TensorReshapingOp<NewDimensions, ArgType>, Device>::CoeffReturnType
> *) [with NewDimensions=const Eigen::DSizes<Eigen::DenseIndex, 4>,
> ArgType=const
> Eigen::TensorReductionOp<Eigen::internal::AvgPoolMeanReducer<double>, const
> Eigen::IndexList<Eigen::type2index<1L>, Eigen::type2index<2L>>, const
> Eigen::TensorImagePatchOp<-1L, -1L, const Eigen::TensorLayoutSwapOp<const
> Eigen::TensorMap<Eigen::Tensor<const double, 4, 1, Eigen::DenseIndex>, 16,
> Eigen::MakePointer>>>, Eigen::MakePointer>, Device=Eigen::GpuDevice]"
> external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorAssign.h(146):
> here
>             instantiation of "__nv_bool Eigen::TensorEvaluator<const
> Eigen::TensorAssignOp<LeftArgType, RightArgType>,
> Device>::evalSubExprsIfNeeded(Eigen::TensorEvaluator<const
> Eigen::TensorAssignOp<LeftArgType, RightArgType>, Device>::Scalar *) [with
> LeftArgType=Eigen::TensorLayoutSwapOp<Eigen::TensorMap<Eigen::Tensor<double,
> 4, 1, Eigen::DenseIndex>, 16, Eigen::MakePointer>>, RightArgType=const
> Eigen::TensorReshapingOp<const Eigen::DSizes<Eigen::DenseIndex, 4>, const
> Eigen::TensorReductionOp<Eigen::internal::AvgPoolMeanReducer<double>, const
> Eigen::IndexList<Eigen::type2index<1L>, Eigen::type2index<2L>>, const
> Eigen::TensorImagePatchOp<-1L, -1L, const Eigen::TensorLayoutSwapOp<const
> Eigen::TensorMap<Eigen::Tensor<const double, 4, 1, Eigen::DenseIndex>, 16,
> Eigen::MakePointer>>>, Eigen::MakePointer>>, Device=Eigen::GpuDevice]"
> external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorExecutor.h(422):
> here
>             instantiation of "void
> Eigen::internal::TensorExecutor<Expression, Eigen::GpuDevice, Vectorizable,
> Tileable>::run(const Expression &, const Eigen::GpuDevice &) [with
> Expression=const
> Eigen::TensorAssignOp<Eigen::TensorLayoutSwapOp<Eigen::TensorMap<Eigen::Tensor<double,
> 4, 1, Eigen::DenseIndex>, 16, Eigen::MakePointer>>, const
> Eigen::TensorReshapingOp<const Eigen::DSizes<Eigen::DenseIndex, 4>, const
> Eigen::TensorReductionOp<Eigen::internal::AvgPoolMeanReducer<double>, const
> Eigen::IndexList<Eigen::type2index<1L>, Eigen::type2index<2L>>, const
> Eigen::TensorImagePatchOp<-1L, -1L, const Eigen::TensorLayoutSwapOp<const
> Eigen::TensorMap<Eigen::Tensor<const double, 4, 1, Eigen::DenseIndex>, 16,
> Eigen::MakePointer>>>, Eigen::MakePointer>>>, Vectorizable=false,
> Tileable=false]"
> external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorDevice.h(35):
> here
>             instantiation of "Eigen::TensorDevice<ExpressionType,
> DeviceType> &Eigen::TensorDevice<ExpressionType,
> DeviceType>::operator=(const OtherDerived &) [with
> ExpressionType=Eigen::TensorLayoutSwapOp<Eigen::TensorMap<Eigen::Tensor<double,
> 4, 1, Eigen::DenseIndex>, 16, Eigen::MakePointer>>,
> DeviceType=tensorflow::GPUDevice,
> OtherDerived=Eigen::TensorReshapingOp<const
> Eigen::DSizes<Eigen::DenseIndex, 4>, const
> Eigen::TensorReductionOp<Eigen::internal::AvgPoolMeanReducer<double>, const
> Eigen::IndexList<Eigen::type2index<1L>, Eigen::type2index<2L>>, const
> Eigen::TensorImagePatchOp<-1L, -1L, const Eigen::TensorLayoutSwapOp<const
> Eigen::TensorMap<Eigen::Tensor<const double, 4, 1, Eigen::DenseIndex>, 16,
> Eigen::MakePointer>>>, Eigen::MakePointer>>]"
> ./tensorflow/core/kernels/avgpooling_op.h(42): here
>             instantiation of "void
> tensorflow::functor::SpatialAvgPooling<Device, T>::operator()(const Device
> &, tensorflow::TTypes<T, 4, Eigen::DenseIndex>::Tensor,
> tensorflow::TTypes<T, 4, Eigen::DenseIndex>::ConstTensor, int, int, int,
> int, const Eigen::PaddingType &) [with Device=tensorflow::GPUDevice,
> T=double]"
> tensorflow/core/kernels/avgpooling_op_gpu.cu.cc(38): here
>
>
> I'm not sure how concerned I should be about these. The build will
> succeed, but... I wouldn't be at all surprised to get weird results
> eventually.
>
> In this particular case, it looks like it's complaining because
> Eigen::GpuDevice::allocate_temp appears to be __host__ rather than __host__
> __device__ (i.e. missing EIGEN_DEVICE_FUNC). I fully admit that I could be
> misinterpreting that or otherwise misunderstanding something basic.
>
> Should I be concerned about these?
>
> Thanks!
> ---
> Eric Klein
> [email protected]
>
>
> On Wed, Jun 19, 2019 at 5:21 PM Rasmus Munk Larsen <[email protected]>
> wrote:
>
>> Erik, does Artem's suggestion work for you?
>>
>> On Wed, Jun 19, 2019 at 2:52 PM Artem Belevich <[email protected]> wrote:
>>
>>>
>>>
>>> On Wed, Jun 19, 2019 at 1:47 PM Rasmus Munk Larsen <[email protected]>
>>> wrote:
>>>
>>>> It looks like we broke the Eigen Cuda build on Mac. What do you think
>>>> about his workaround?
>>>>
>>>> ---------- Forwarded message ---------
>>>> From: Eric Klein <[email protected]>
>>>> Date: Wed, Jun 19, 2019 at 1:39 PM
>>>> Subject: [eigen] Mac CUDA build failure question
>>>> To: <[email protected]>
>>>>
>>>>
>>>> Hello all,
>>>>
>>>> I posted a question on the forums several days back, but suspect that
>>>> might not be the right place to be asking what I'm asking, so I'm trying
>>>> the mailing list as well.
>>>>
>>>> I'll just repost here what I put in the forums, but the link to that is
>>>> here: https://forum.kde.org/viewtopic.php?f=74&t=161199
>>>>
>>>> I'm trying to build Eigen on Mac for CUDA (using the nvcc compiler),
>>>> and getting build errors. I understand the errors, and I have a change that
>>>> lets me dodge the build failures, but I suspect it's not the right change
>>>> for checkin, and so I'm looking for feedback.
>>>>
>>>> So the issue I have is in Half.h. I wind up getting errors about a
>>>> bunch of operators being already defined. The core issue is that on Mac,
>>>> nvcc (the CUDA compliler) is using gcc as the host compiler, but gcc on Mac
>>>> is built on top of clang. Eigen seems to be implicitly assuming that the
>>>> presence of clang implies that absence of CUDA (or at least the absence of
>>>> nvcc CUDA support).
>>>>
>>>> In my build I'm hitting this block:
>>>>
>>>> #if (defined(EIGEN_HAS_CUDA_FP16) && defined(EIGEN_CUDA_ARCH) && \
>>>>      EIGEN_CUDA_ARCH >= 530) ||                                  \
>>>>     (defined(EIGEN_HAS_HIP_FP16) && defined(HIP_DEVICE_COMPILE))
>>>> #define EIGEN_HAS_NATIVE_FP16
>>>> #endif
>>>>
>>>> which results in EIGEN_HAS_NATIVE_FP16 being set, and so we wind up
>>>> compiling in all the operators from Half.h:253-313. That's fine so far.
>>>>
>>>
>>> This assumes device-side compilation.
>>>
>>>
>>>>
>>>> What happens next is we hit this line:
>>>>
>>>> #if !defined(EIGEN_HAS_NATIVE_FP16) || EIGEN_COMP_CLANG // Emulate
>>>> support for half floats
>>>>
>>>> which is followed shortly after by (roughly) the same operator
>>>> functions (but... emulated), and I get errors because those operator
>>>> functions were defined above.
>>>>
>>>
>>> If Clang were CUDA compiler, that would not be a problem. This implies
>>> that it's a CUDA compilation with NVCC. What puzzles me is how did we end
>>> up with EIGEN_COMP_CLANG defined for the *device* side of the
>>> compilation. I suspect it's the side effect of NVCC doing device-side
>>> preprocessing with clang, but actually compiling with cicc, which is
>>> obviously not clang.
>>>
>>> I guess what we need to do here is something like this:
>>> #if !defined(EIGEN_HAS_NATIVE_FP16) || (EIGEN_COMP_CLANG && !
>>> EIGEN_COMP_NVCC)
>>>
>>> That, and a comment explaining what's going on.
>>>
>>> If that does not help, it would be great to compile with '-keep
>>> -verbose' and check which compilation phase is failing and what exactly is
>>> it trying to compile.
>>>
>>> --Artem
>>>
>>>
>>>> So. My hack to work around this is to ensure that EIGEN_COMP_CLANG gets
>>>> set to 0 in Macros.h if __NVCC__ is defined. That works fine for me
>>>> locally, and gets Eigen building fine (and thus unblocks me on getting
>>>> TensorFlow building for Mac, or at least unblocks this issue).
>>>>
>>>> I'm willing to bet however that this is the wrong thing to do in
>>>> general. I don't understand enough of what this second code block is doing
>>>> to really understand why clang is being treated differently than nvcc here
>>>> (and specifically why half support needs to be emulated in the presence of
>>>> clang). I believe there is a version of clang that supports CUDA (at least
>>>> on some platforms?). Presumably this is for that, but I don't know enough
>>>> about how that differs from nvcc to fully grok this.
>>>>
>>>> Can anyone help enlighten me about the best way to fix this?
>>>>
>>>> Thanks!
>>>> ---
>>>> Eric Klein
>>>> [email protected]
>>>>
>>>
>>>
>>> --
>>> --Artem Belevich
>>>
>>

Re: [eigen] Mac CUDA build failure question

Reply via email to