Hello MXNet devs,

I'd like to start a thread discussing what our build system should look
like in MXNet 2.0.  I'd propose that although the current make system has
served us well in the past, we remove it along with the bump to 2.0.  The
end goal I'd like to see is that we have a clean build system, without a
bunch of conditional logic that makes contributing and testing MXNet a
simpler process.  Additionally I'd propose we target a minimum cmake
version of 3.7 for reasons described below.

First I'd like to give some context on why I'd propose we don't just switch
to cmake, but we also target a relatively new version (version 3.7 from
Nov, 2016) of cmake.  The largest benefits in making this change would
apply to CUDA builds where cmake itself has quite inconsistent
functionality between versions.  One persistent annoyance I've had with
cmake is that we've had conditional logic for the FindCUDA command which at
one point targeted some modern cmake features, but then in subsequent
versions of cmake the way these features works was tweaked, and now I find
these cmake features are consistently broken to the point that I require a
bunch of -D defines to compile properly or to use an IDE.  An additional
CUDA related issue is that every time there's a new SM added to NVCC we
have to make a few source changes to support it.  I could see this being
problematic for users who may suddenly realize that due to their
compilation settings, they may not actually be enabling the features they
think they are with their shiny new GPUs.

As an alternative if we, for example, target cmake 3.7 at a minimum, and we
want to find cuda and then build a list of reasonable PTX/BINS we could use
the following command[1]:

----
FindCUDA(...)
...
CUDA_SELECT_NVCC_ARCH_FLAGS(ARCH_FLAGS 3.0 3.5+PTX 5.2(5.0) Maxwell)
  LIST(APPEND CUDA_NVCC_FLAGS ${ARCH_FLAGS})
----

Simple, concise, and it would help to make the building experience more
consistent across platforms, build environments and IDEs (looking at you
CLion).  We'd of course need to do a little experimentation work to make
sure that this does indeed work as intended, and can replace the currently
complex findCuda logic we have in our build systems, but for the sake of
the proposal let's assume these cmake commands do indeed work consistently
as documented from cmake 3.7 onwards.

To give users a chance to update their tooling I'd also suggest we begin
warning users at least a release in advance that make based builds will be
deprecated in MXNet 2.0 so they can begin migrating to cmake.  I'd also
want to display deprecation messages for unused cmake flags (such as the
profiler flag) for a release before the 2.0 release, and then remove them
in 2.0.

Of course not all users have cmake 3.7 on their systems, some of our
employers force use to use ridiculously outdated linux distributions.  The
good news for these users is that if we can offer Docker compilation with
an image that has a supported version of cmake and we should be able to
build a portable binary that work even with very old distributions of
Linux.  Additionally installing cmake from source is also fairly
straightforward [2] and works quite well on older distros in my experience.

Looking forward to hearing what others think.  Any preferred build systems
that you all would want to use?  Is cmake the right system to centralize
on?  If so, is version 3.7 a reasonable minimum version to target?  Is the
2.0 release a good point at which we can think about simplifying build
logic?

1: https://cmake.org/cmake/help/v3.7/module/FindCUDA.html
2: https://github.com/Kitware/CMake

Reply via email to