Thank you both very much for the help with this. It's much appreciated. --- Eric Klein [email protected]
On Thu, Jun 20, 2019 at 3:46 PM Rasmus Munk Larsen <[email protected]> wrote: > PR is here: > https://bitbucket.org/eigen/eigen/pull-requests/659/fix-cuda-build-on-mac/diff > > > On Thu, Jun 20, 2019 at 3:12 PM Eric Klein <[email protected]> wrote: > >> Thank you both. I appreciate the help with this. >> --- >> Eric Klein >> [email protected] >> >> >> On Thu, Jun 20, 2019 at 3:11 PM Artem Belevich <[email protected]> wrote: >> >>> The changes look reasonable to me. Thank you for helping to sort this >>> out. >>> >>> >>> On Thu, Jun 20, 2019 at 3:05 PM Eric Klein <[email protected]> wrote: >>> >>>> Ok. Looks like the warnings are there with and without my hack(s), and >>>> the minimal set of edits needed to get Eigen to build on Mac with nvcc >>>> consists of: >>>> >>>> Half.h. Change this: >>>> #if !defined(EIGEN_HAS_NATIVE_FP16) || EIGEN_COMP_CLANG // Emulate >>>> support for half floats >>>> to this: >>>> #if !defined(EIGEN_HAS_NATIVE_FP16) || (EIGEN_COMP_CLANG && >>>> !EIGEN_COMP_NVCC) // Emulate support for half floats >>>> >>>> And in PacketMath.h, change this: >>>> #if defined(EIGEN_CUDA_ARCH) || defined(EIGEN_HIP_DEVICE_COMPILE) || >>>> (defined(EIGEN_CUDACC) && EIGEN_COMP_CLANG) >>>> to this: >>>> #if defined(EIGEN_CUDA_ARCH) || defined(EIGEN_HIP_DEVICE_COMPILE) || >>>> (defined(EIGEN_CUDACC) && EIGEN_COMP_CLANG && !EIGEN_COMP_NVCC) >>>> >>>> Obviously that's excluding any enlightening comments about why that's >>>> being done. >>>> >>>> Would you like me to prepare a patch file, or is this something that >>>> would be better handled by one of the regular contributors? >>>> >>>> Thank you! >>>> --- >>>> Eric Klein >>>> [email protected] >>>> >>>> >>>> On Thu, Jun 20, 2019 at 12:01 AM Eric Klein <[email protected]> wrote: >>>> >>>>> That appears to work, although there are 2-3 other places that need >>>>> similar modifications in order to work. I'll try to get you a more >>>>> complete >>>>> list tomorrow. >>>>> >>>>> I'm paying more attention tonight to warnings coming from Eigen than I >>>>> had been previously ignoring, and both with my old Macros.h based hack and >>>>> the newer modifications, I'm seeing some of these: "warning: calling a >>>>> __host__ function from a __host__ __device__ function is not allowed". A >>>>> representative one is: >>>>> >>>>> external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorReduction.h(735): >>>>> warning: calling a __host__ function from a __host__ __device__ function >>>>> is >>>>> not allowed >>>>> detected during: >>>>> instantiation of "__nv_bool Eigen::TensorEvaluator<const >>>>> Eigen::TensorReductionOp<Op, Dims, ArgType, MakePointer_>, >>>>> Device>::evalSubExprsIfNeeded(MakePointer_<Eigen::TensorEvaluator<const >>>>> Eigen::TensorReductionOp<Op, Dims, ArgType, MakePointer_>, >>>>> Device>::CoeffReturnType>::Type) [with >>>>> Op=Eigen::internal::AvgPoolMeanReducer<double>, Dims=const >>>>> Eigen::IndexList<Eigen::type2index<1L>, Eigen::type2index<2L>>, >>>>> ArgType=const Eigen::TensorImagePatchOp<-1L, -1L, const >>>>> Eigen::TensorLayoutSwapOp<const Eigen::TensorMap<Eigen::Tensor<const >>>>> double, 4, 1, Eigen::DenseIndex>, 16, Eigen::MakePointer>>>, >>>>> MakePointer_=Eigen::MakePointer, Device=Eigen::GpuDevice]" >>>>> external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorMorphing.h(172): >>>>> here >>>>> instantiation of "__nv_bool Eigen::TensorEvaluator<const >>>>> Eigen::TensorReshapingOp<NewDimensions, ArgType>, >>>>> Device>::evalSubExprsIfNeeded(Eigen::TensorEvaluator<const >>>>> Eigen::TensorReshapingOp<NewDimensions, ArgType>, Device>::CoeffReturnType >>>>> *) [with NewDimensions=const Eigen::DSizes<Eigen::DenseIndex, 4>, >>>>> ArgType=const >>>>> Eigen::TensorReductionOp<Eigen::internal::AvgPoolMeanReducer<double>, >>>>> const >>>>> Eigen::IndexList<Eigen::type2index<1L>, Eigen::type2index<2L>>, const >>>>> Eigen::TensorImagePatchOp<-1L, -1L, const Eigen::TensorLayoutSwapOp<const >>>>> Eigen::TensorMap<Eigen::Tensor<const double, 4, 1, Eigen::DenseIndex>, 16, >>>>> Eigen::MakePointer>>>, Eigen::MakePointer>, Device=Eigen::GpuDevice]" >>>>> external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorAssign.h(146): >>>>> here >>>>> instantiation of "__nv_bool Eigen::TensorEvaluator<const >>>>> Eigen::TensorAssignOp<LeftArgType, RightArgType>, >>>>> Device>::evalSubExprsIfNeeded(Eigen::TensorEvaluator<const >>>>> Eigen::TensorAssignOp<LeftArgType, RightArgType>, Device>::Scalar *) [with >>>>> LeftArgType=Eigen::TensorLayoutSwapOp<Eigen::TensorMap<Eigen::Tensor<double, >>>>> 4, 1, Eigen::DenseIndex>, 16, Eigen::MakePointer>>, RightArgType=const >>>>> Eigen::TensorReshapingOp<const Eigen::DSizes<Eigen::DenseIndex, 4>, const >>>>> Eigen::TensorReductionOp<Eigen::internal::AvgPoolMeanReducer<double>, >>>>> const >>>>> Eigen::IndexList<Eigen::type2index<1L>, Eigen::type2index<2L>>, const >>>>> Eigen::TensorImagePatchOp<-1L, -1L, const Eigen::TensorLayoutSwapOp<const >>>>> Eigen::TensorMap<Eigen::Tensor<const double, 4, 1, Eigen::DenseIndex>, 16, >>>>> Eigen::MakePointer>>>, Eigen::MakePointer>>, Device=Eigen::GpuDevice]" >>>>> external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorExecutor.h(422): >>>>> here >>>>> instantiation of "void >>>>> Eigen::internal::TensorExecutor<Expression, Eigen::GpuDevice, >>>>> Vectorizable, >>>>> Tileable>::run(const Expression &, const Eigen::GpuDevice &) [with >>>>> Expression=const >>>>> Eigen::TensorAssignOp<Eigen::TensorLayoutSwapOp<Eigen::TensorMap<Eigen::Tensor<double, >>>>> 4, 1, Eigen::DenseIndex>, 16, Eigen::MakePointer>>, const >>>>> Eigen::TensorReshapingOp<const Eigen::DSizes<Eigen::DenseIndex, 4>, const >>>>> Eigen::TensorReductionOp<Eigen::internal::AvgPoolMeanReducer<double>, >>>>> const >>>>> Eigen::IndexList<Eigen::type2index<1L>, Eigen::type2index<2L>>, const >>>>> Eigen::TensorImagePatchOp<-1L, -1L, const Eigen::TensorLayoutSwapOp<const >>>>> Eigen::TensorMap<Eigen::Tensor<const double, 4, 1, Eigen::DenseIndex>, 16, >>>>> Eigen::MakePointer>>>, Eigen::MakePointer>>>, Vectorizable=false, >>>>> Tileable=false]" >>>>> external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorDevice.h(35): >>>>> here >>>>> instantiation of "Eigen::TensorDevice<ExpressionType, >>>>> DeviceType> &Eigen::TensorDevice<ExpressionType, >>>>> DeviceType>::operator=(const OtherDerived &) [with >>>>> ExpressionType=Eigen::TensorLayoutSwapOp<Eigen::TensorMap<Eigen::Tensor<double, >>>>> 4, 1, Eigen::DenseIndex>, 16, Eigen::MakePointer>>, >>>>> DeviceType=tensorflow::GPUDevice, >>>>> OtherDerived=Eigen::TensorReshapingOp<const >>>>> Eigen::DSizes<Eigen::DenseIndex, 4>, const >>>>> Eigen::TensorReductionOp<Eigen::internal::AvgPoolMeanReducer<double>, >>>>> const >>>>> Eigen::IndexList<Eigen::type2index<1L>, Eigen::type2index<2L>>, const >>>>> Eigen::TensorImagePatchOp<-1L, -1L, const Eigen::TensorLayoutSwapOp<const >>>>> Eigen::TensorMap<Eigen::Tensor<const double, 4, 1, Eigen::DenseIndex>, 16, >>>>> Eigen::MakePointer>>>, Eigen::MakePointer>>]" >>>>> ./tensorflow/core/kernels/avgpooling_op.h(42): here >>>>> instantiation of "void >>>>> tensorflow::functor::SpatialAvgPooling<Device, T>::operator()(const Device >>>>> &, tensorflow::TTypes<T, 4, Eigen::DenseIndex>::Tensor, >>>>> tensorflow::TTypes<T, 4, Eigen::DenseIndex>::ConstTensor, int, int, int, >>>>> int, const Eigen::PaddingType &) [with Device=tensorflow::GPUDevice, >>>>> T=double]" >>>>> tensorflow/core/kernels/avgpooling_op_gpu.cu.cc(38): here >>>>> >>>>> >>>>> I'm not sure how concerned I should be about these. The build will >>>>> succeed, but... I wouldn't be at all surprised to get weird results >>>>> eventually. >>>>> >>>>> In this particular case, it looks like it's complaining because >>>>> Eigen::GpuDevice::allocate_temp appears to be __host__ rather than >>>>> __host__ >>>>> __device__ (i.e. missing EIGEN_DEVICE_FUNC). I fully admit that I could be >>>>> misinterpreting that or otherwise misunderstanding something basic. >>>>> >>>>> Should I be concerned about these? >>>>> >>>>> Thanks! >>>>> --- >>>>> Eric Klein >>>>> [email protected] >>>>> >>>>> >>>>> On Wed, Jun 19, 2019 at 5:21 PM Rasmus Munk Larsen < >>>>> [email protected]> wrote: >>>>> >>>>>> Erik, does Artem's suggestion work for you? >>>>>> >>>>>> On Wed, Jun 19, 2019 at 2:52 PM Artem Belevich <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Jun 19, 2019 at 1:47 PM Rasmus Munk Larsen < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> It looks like we broke the Eigen Cuda build on Mac. What do you >>>>>>>> think about his workaround? >>>>>>>> >>>>>>>> ---------- Forwarded message --------- >>>>>>>> From: Eric Klein <[email protected]> >>>>>>>> Date: Wed, Jun 19, 2019 at 1:39 PM >>>>>>>> Subject: [eigen] Mac CUDA build failure question >>>>>>>> To: <[email protected]> >>>>>>>> >>>>>>>> >>>>>>>> Hello all, >>>>>>>> >>>>>>>> I posted a question on the forums several days back, but suspect >>>>>>>> that might not be the right place to be asking what I'm asking, so I'm >>>>>>>> trying the mailing list as well. >>>>>>>> >>>>>>>> I'll just repost here what I put in the forums, but the link to >>>>>>>> that is here: https://forum.kde.org/viewtopic.php?f=74&t=161199 >>>>>>>> >>>>>>>> I'm trying to build Eigen on Mac for CUDA (using the nvcc >>>>>>>> compiler), and getting build errors. I understand the errors, and I >>>>>>>> have a >>>>>>>> change that lets me dodge the build failures, but I suspect it's not >>>>>>>> the >>>>>>>> right change for checkin, and so I'm looking for feedback. >>>>>>>> >>>>>>>> So the issue I have is in Half.h. I wind up getting errors about a >>>>>>>> bunch of operators being already defined. The core issue is that on >>>>>>>> Mac, >>>>>>>> nvcc (the CUDA compliler) is using gcc as the host compiler, but gcc >>>>>>>> on Mac >>>>>>>> is built on top of clang. Eigen seems to be implicitly assuming that >>>>>>>> the >>>>>>>> presence of clang implies that absence of CUDA (or at least the >>>>>>>> absence of >>>>>>>> nvcc CUDA support). >>>>>>>> >>>>>>>> In my build I'm hitting this block: >>>>>>>> >>>>>>>> #if (defined(EIGEN_HAS_CUDA_FP16) && defined(EIGEN_CUDA_ARCH) && \ >>>>>>>> EIGEN_CUDA_ARCH >= 530) || \ >>>>>>>> (defined(EIGEN_HAS_HIP_FP16) && defined(HIP_DEVICE_COMPILE)) >>>>>>>> #define EIGEN_HAS_NATIVE_FP16 >>>>>>>> #endif >>>>>>>> >>>>>>>> which results in EIGEN_HAS_NATIVE_FP16 being set, and so we wind up >>>>>>>> compiling in all the operators from Half.h:253-313. That's fine so far. >>>>>>>> >>>>>>> >>>>>>> This assumes device-side compilation. >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> What happens next is we hit this line: >>>>>>>> >>>>>>>> #if !defined(EIGEN_HAS_NATIVE_FP16) || EIGEN_COMP_CLANG // Emulate >>>>>>>> support for half floats >>>>>>>> >>>>>>>> which is followed shortly after by (roughly) the same operator >>>>>>>> functions (but... emulated), and I get errors because those operator >>>>>>>> functions were defined above. >>>>>>>> >>>>>>> >>>>>>> If Clang were CUDA compiler, that would not be a problem. This >>>>>>> implies that it's a CUDA compilation with NVCC. What puzzles me is how >>>>>>> did >>>>>>> we end up with EIGEN_COMP_CLANG defined for the *device* side of >>>>>>> the compilation. I suspect it's the side effect of NVCC doing >>>>>>> device-side >>>>>>> preprocessing with clang, but actually compiling with cicc, which is >>>>>>> obviously not clang. >>>>>>> >>>>>>> I guess what we need to do here is something like this: >>>>>>> #if !defined(EIGEN_HAS_NATIVE_FP16) || (EIGEN_COMP_CLANG && ! >>>>>>> EIGEN_COMP_NVCC) >>>>>>> >>>>>>> That, and a comment explaining what's going on. >>>>>>> >>>>>>> If that does not help, it would be great to compile with '-keep >>>>>>> -verbose' and check which compilation phase is failing and what exactly >>>>>>> is >>>>>>> it trying to compile. >>>>>>> >>>>>>> --Artem >>>>>>> >>>>>>> >>>>>>>> So. My hack to work around this is to ensure that EIGEN_COMP_CLANG >>>>>>>> gets set to 0 in Macros.h if __NVCC__ is defined. That works fine for >>>>>>>> me >>>>>>>> locally, and gets Eigen building fine (and thus unblocks me on getting >>>>>>>> TensorFlow building for Mac, or at least unblocks this issue). >>>>>>>> >>>>>>>> I'm willing to bet however that this is the wrong thing to do in >>>>>>>> general. I don't understand enough of what this second code block is >>>>>>>> doing >>>>>>>> to really understand why clang is being treated differently than nvcc >>>>>>>> here >>>>>>>> (and specifically why half support needs to be emulated in the >>>>>>>> presence of >>>>>>>> clang). I believe there is a version of clang that supports CUDA (at >>>>>>>> least >>>>>>>> on some platforms?). Presumably this is for that, but I don't know >>>>>>>> enough >>>>>>>> about how that differs from nvcc to fully grok this. >>>>>>>> >>>>>>>> Can anyone help enlighten me about the best way to fix this? >>>>>>>> >>>>>>>> Thanks! >>>>>>>> --- >>>>>>>> Eric Klein >>>>>>>> [email protected] >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> --Artem Belevich >>>>>>> >>>>>> >>> >>> -- >>> --Artem Belevich >>> >>
