Just as a point of reference, I don't think that get any pushback at MapR for not supporting RHEL 5 and that has been our policy for a few years now.
That experience should be pretty similar for Arrow, except that I would expect that new adoptions might be even more canted towards current versions. On Tue, Sep 4, 2018 at 3:24 PM Wes McKinney <wesmck...@gmail.com> wrote: > hi folks, > > Surfacing a JIRA discussion ([4]) to the mailing list for discussion. > > The manylinux1 ABI was developed to provide a mechanism for portable > Python packages with pre-compiled binary extensions supporting C and > C++, including C++11, on a wide variety of Linux distributions without > need for distribution-specific packages. This is accomplished using > RedHat's devtoolset-2, which performs selecting static linking of > symbols from libstdc++ that cause ABI conflicts when used on systems > with older standard libraries. > > The base image for producing these binaries is specified in a Dockerfile > [1]. > > The problem that we are having is that some C++ libraries, notably > Google's Abseil C++ library, require a version of glibc that is too > new for RHEL5. By building with CentOS6 / RHEL6 as the base image, we > would get a new enough glibc (version 2.12). But building against > glibc 2.12 would leave behind the RHEL5 folks. > > There is the in-discussion manylinux2010 standard uses RHEL6 as a base > standard, but it is not yet finalized or in production. > > Some modern C++ projects shipping to Python have already left behind > the manylinux1 standard even though their Python binaries claim to > implement the standard. Both PyTorch and TensorFlow are tagged as > manylinux1 although they have a different ABI. See [2] for example and > [3] > > In my view there are two paths forward, neither perfect: > > 1) Stick with the manylinux1 ABI and do not use thirdparty libraries > requiring newer glibc > 2) "Cheat" on manylinux1 by using centos6 instead of centos5 as the > base image for the wheel builds. This is what PyTorch is doing > > Since centos5 / RHEL5 are already past EOL those would be the primary > casualties, but I'm not sure how many users would be affected. My > guess is that they represent a small minority of our users at this > point. RedHat is offering extended support for RHEL5 through end of > 2020 but those are probably fairly exceptional cases and unlikely > (IMHO) to be working on the bleeding edge of Python data engineering. > > Personally I would like to go with Option 2 and hope that this > particular Python packaging gets sorted out in the next 12-24 months > as we've already suffered problems due to TensorFlow and PyTorch's > non-conformity with the manylinux1 ABI. > > Interested in the opinions of others. > > - Wes > > [1]: > https://github.com/pypa/manylinux/blob/master/docker/Dockerfile-x86_64 > [2]: > https://github.com/NVIDIA/nvidia-docker/issues/348#issuecomment-288875848 > [3]: https://github.com/pypa/manylinux/issues/96 > [4]: https://issues.apache.org/jira/browse/ARROW-2461 >