Re: [eigen] Mac CUDA build failure question

Eric Klein Thu, 20 Jun 2019 00:03:24 -0700

That appears to work, although there are 2-3 other places that need similar
modifications in order to work. I'll try to get you a more complete list
tomorrow.


I'm paying more attention tonight to warnings coming from Eigen than I had
been previously ignoring, and both with my old Macros.h based hack and the
newer modifications, I'm seeing some of these: "warning: calling a __host__
function from a __host__ __device__ function is not allowed". A
representative one is:

external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorReduction.h(735):
warning: calling a __host__ function from a __host__ __device__ function is
not allowed
          detected during:
            instantiation of "__nv_bool Eigen::TensorEvaluator<const
Eigen::TensorReductionOp<Op, Dims, ArgType, MakePointer_>,
Device>::evalSubExprsIfNeeded(MakePointer_<Eigen::TensorEvaluator<const
Eigen::TensorReductionOp<Op, Dims, ArgType, MakePointer_>,
Device>::CoeffReturnType>::Type) [with
Op=Eigen::internal::AvgPoolMeanReducer<double>, Dims=const
Eigen::IndexList<Eigen::type2index<1L>, Eigen::type2index<2L>>,
ArgType=const Eigen::TensorImagePatchOp<-1L, -1L, const
Eigen::TensorLayoutSwapOp<const Eigen::TensorMap<Eigen::Tensor<const
double, 4, 1, Eigen::DenseIndex>, 16, Eigen::MakePointer>>>,
MakePointer_=Eigen::MakePointer, Device=Eigen::GpuDevice]"
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorMorphing.h(172):
here
            instantiation of "__nv_bool Eigen::TensorEvaluator<const
Eigen::TensorReshapingOp<NewDimensions, ArgType>,
Device>::evalSubExprsIfNeeded(Eigen::TensorEvaluator<const
Eigen::TensorReshapingOp<NewDimensions, ArgType>, Device>::CoeffReturnType
*) [with NewDimensions=const Eigen::DSizes<Eigen::DenseIndex, 4>,
ArgType=const
Eigen::TensorReductionOp<Eigen::internal::AvgPoolMeanReducer<double>, const
Eigen::IndexList<Eigen::type2index<1L>, Eigen::type2index<2L>>, const
Eigen::TensorImagePatchOp<-1L, -1L, const Eigen::TensorLayoutSwapOp<const
Eigen::TensorMap<Eigen::Tensor<const double, 4, 1, Eigen::DenseIndex>, 16,
Eigen::MakePointer>>>, Eigen::MakePointer>, Device=Eigen::GpuDevice]"
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorAssign.h(146):
here
            instantiation of "__nv_bool Eigen::TensorEvaluator<const
Eigen::TensorAssignOp<LeftArgType, RightArgType>,
Device>::evalSubExprsIfNeeded(Eigen::TensorEvaluator<const
Eigen::TensorAssignOp<LeftArgType, RightArgType>, Device>::Scalar *) [with
LeftArgType=Eigen::TensorLayoutSwapOp<Eigen::TensorMap<Eigen::Tensor<double,
4, 1, Eigen::DenseIndex>, 16, Eigen::MakePointer>>, RightArgType=const
Eigen::TensorReshapingOp<const Eigen::DSizes<Eigen::DenseIndex, 4>, const
Eigen::TensorReductionOp<Eigen::internal::AvgPoolMeanReducer<double>, const
Eigen::IndexList<Eigen::type2index<1L>, Eigen::type2index<2L>>, const
Eigen::TensorImagePatchOp<-1L, -1L, const Eigen::TensorLayoutSwapOp<const
Eigen::TensorMap<Eigen::Tensor<const double, 4, 1, Eigen::DenseIndex>, 16,
Eigen::MakePointer>>>, Eigen::MakePointer>>, Device=Eigen::GpuDevice]"
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorExecutor.h(422):
here
            instantiation of "void
Eigen::internal::TensorExecutor<Expression, Eigen::GpuDevice, Vectorizable,
Tileable>::run(const Expression &, const Eigen::GpuDevice &) [with
Expression=const
Eigen::TensorAssignOp<Eigen::TensorLayoutSwapOp<Eigen::TensorMap<Eigen::Tensor<double,
4, 1, Eigen::DenseIndex>, 16, Eigen::MakePointer>>, const
Eigen::TensorReshapingOp<const Eigen::DSizes<Eigen::DenseIndex, 4>, const
Eigen::TensorReductionOp<Eigen::internal::AvgPoolMeanReducer<double>, const
Eigen::IndexList<Eigen::type2index<1L>, Eigen::type2index<2L>>, const
Eigen::TensorImagePatchOp<-1L, -1L, const Eigen::TensorLayoutSwapOp<const
Eigen::TensorMap<Eigen::Tensor<const double, 4, 1, Eigen::DenseIndex>, 16,
Eigen::MakePointer>>>, Eigen::MakePointer>>>, Vectorizable=false,
Tileable=false]"
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorDevice.h(35):
here
            instantiation of "Eigen::TensorDevice<ExpressionType,
DeviceType> &Eigen::TensorDevice<ExpressionType,
DeviceType>::operator=(const OtherDerived &) [with
ExpressionType=Eigen::TensorLayoutSwapOp<Eigen::TensorMap<Eigen::Tensor<double,
4, 1, Eigen::DenseIndex>, 16, Eigen::MakePointer>>,
DeviceType=tensorflow::GPUDevice,
OtherDerived=Eigen::TensorReshapingOp<const
Eigen::DSizes<Eigen::DenseIndex, 4>, const
Eigen::TensorReductionOp<Eigen::internal::AvgPoolMeanReducer<double>, const
Eigen::IndexList<Eigen::type2index<1L>, Eigen::type2index<2L>>, const
Eigen::TensorImagePatchOp<-1L, -1L, const Eigen::TensorLayoutSwapOp<const
Eigen::TensorMap<Eigen::Tensor<const double, 4, 1, Eigen::DenseIndex>, 16,
Eigen::MakePointer>>>, Eigen::MakePointer>>]"
./tensorflow/core/kernels/avgpooling_op.h(42): here
            instantiation of "void
tensorflow::functor::SpatialAvgPooling<Device, T>::operator()(const Device
&, tensorflow::TTypes<T, 4, Eigen::DenseIndex>::Tensor,
tensorflow::TTypes<T, 4, Eigen::DenseIndex>::ConstTensor, int, int, int,
int, const Eigen::PaddingType &) [with Device=tensorflow::GPUDevice,
T=double]"
tensorflow/core/kernels/avgpooling_op_gpu.cu.cc(38): here


I'm not sure how concerned I should be about these. The build will succeed,
but... I wouldn't be at all surprised to get weird results eventually.

In this particular case, it looks like it's complaining because
Eigen::GpuDevice::allocate_temp appears to be __host__ rather than __host__
__device__ (i.e. missing EIGEN_DEVICE_FUNC). I fully admit that I could be
misinterpreting that or otherwise misunderstanding something basic.

Should I be concerned about these?

Thanks!
---
Eric Klein
[email protected]


On Wed, Jun 19, 2019 at 5:21 PM Rasmus Munk Larsen <[email protected]>
wrote:

> Erik, does Artem's suggestion work for you?
>
> On Wed, Jun 19, 2019 at 2:52 PM Artem Belevich <[email protected]> wrote:
>
>>
>>
>> On Wed, Jun 19, 2019 at 1:47 PM Rasmus Munk Larsen <[email protected]>
>> wrote:
>>
>>> It looks like we broke the Eigen Cuda build on Mac. What do you think
>>> about his workaround?
>>>
>>> ---------- Forwarded message ---------
>>> From: Eric Klein <[email protected]>
>>> Date: Wed, Jun 19, 2019 at 1:39 PM
>>> Subject: [eigen] Mac CUDA build failure question
>>> To: <[email protected]>
>>>
>>>
>>> Hello all,
>>>
>>> I posted a question on the forums several days back, but suspect that
>>> might not be the right place to be asking what I'm asking, so I'm trying
>>> the mailing list as well.
>>>
>>> I'll just repost here what I put in the forums, but the link to that is
>>> here: https://forum.kde.org/viewtopic.php?f=74&t=161199
>>>
>>> I'm trying to build Eigen on Mac for CUDA (using the nvcc compiler), and
>>> getting build errors. I understand the errors, and I have a change that
>>> lets me dodge the build failures, but I suspect it's not the right change
>>> for checkin, and so I'm looking for feedback.
>>>
>>> So the issue I have is in Half.h. I wind up getting errors about a bunch
>>> of operators being already defined. The core issue is that on Mac, nvcc
>>> (the CUDA compliler) is using gcc as the host compiler, but gcc on Mac is
>>> built on top of clang. Eigen seems to be implicitly assuming that the
>>> presence of clang implies that absence of CUDA (or at least the absence of
>>> nvcc CUDA support).
>>>
>>> In my build I'm hitting this block:
>>>
>>> #if (defined(EIGEN_HAS_CUDA_FP16) && defined(EIGEN_CUDA_ARCH) && \
>>>      EIGEN_CUDA_ARCH >= 530) ||                                  \
>>>     (defined(EIGEN_HAS_HIP_FP16) && defined(HIP_DEVICE_COMPILE))
>>> #define EIGEN_HAS_NATIVE_FP16
>>> #endif
>>>
>>> which results in EIGEN_HAS_NATIVE_FP16 being set, and so we wind up
>>> compiling in all the operators from Half.h:253-313. That's fine so far.
>>>
>>
>> This assumes device-side compilation.
>>
>>
>>>
>>> What happens next is we hit this line:
>>>
>>> #if !defined(EIGEN_HAS_NATIVE_FP16) || EIGEN_COMP_CLANG // Emulate
>>> support for half floats
>>>
>>> which is followed shortly after by (roughly) the same operator functions
>>> (but... emulated), and I get errors because those operator functions were
>>> defined above.
>>>
>>
>> If Clang were CUDA compiler, that would not be a problem. This implies
>> that it's a CUDA compilation with NVCC. What puzzles me is how did we end
>> up with EIGEN_COMP_CLANG defined for the *device* side of the
>> compilation. I suspect it's the side effect of NVCC doing device-side
>> preprocessing with clang, but actually compiling with cicc, which is
>> obviously not clang.
>>
>> I guess what we need to do here is something like this:
>> #if !defined(EIGEN_HAS_NATIVE_FP16) || (EIGEN_COMP_CLANG && !
>> EIGEN_COMP_NVCC)
>>
>> That, and a comment explaining what's going on.
>>
>> If that does not help, it would be great to compile with '-keep -verbose'
>> and check which compilation phase is failing and what exactly is it trying
>> to compile.
>>
>> --Artem
>>
>>
>>> So. My hack to work around this is to ensure that EIGEN_COMP_CLANG gets
>>> set to 0 in Macros.h if __NVCC__ is defined. That works fine for me
>>> locally, and gets Eigen building fine (and thus unblocks me on getting
>>> TensorFlow building for Mac, or at least unblocks this issue).
>>>
>>> I'm willing to bet however that this is the wrong thing to do in
>>> general. I don't understand enough of what this second code block is doing
>>> to really understand why clang is being treated differently than nvcc here
>>> (and specifically why half support needs to be emulated in the presence of
>>> clang). I believe there is a version of clang that supports CUDA (at least
>>> on some platforms?). Presumably this is for that, but I don't know enough
>>> about how that differs from nvcc to fully grok this.
>>>
>>> Can anyone help enlighten me about the best way to fix this?
>>>
>>> Thanks!
>>> ---
>>> Eric Klein
>>> [email protected]
>>>
>>
>>
>> --
>> --Artem Belevich
>>
>

Re: [eigen] Mac CUDA build failure question

Reply via email to