Thank you, Steve for the feedback. Would you be open to use settings.xml to
set developer-based flag so that developers don’t need to type -DskipDocker as
their default build? In settings.xml, user can customize:
<profile>
<id>skip.docker.by.default</id>
<activation>
<activeByDefault>true</activeByDefault>
</activation>
<properties>
<skipDocker>true</skipDocker>
</properties>
</profile>
This helps to shorten the time for developers that does not touch any of the
docker work, and want to avoid typing -DskipDocker every time. Some developers
prefer to have docker build as active because their projects require frequent
build of docker images. The second concern of using profile-based build
process, it introduces maintenance cost in Jenkins and conflict with maven
release procedure. We will end up with multiple Jenkins jobs each has
different profiles toggled for each major branch of Hadoop. When a new profile
is introduced, someone needs to make changes to Jenkins to ensure the committed
code doesn’t skip through untested, e.g. yarn-ui, docker etc. It makes Jenkins
build configuration more fragile to maintain and non-backward compatible.
Several Apache projects have release builds in builds.apache.org like
Mesos-Release, and this is achievable because they don’t introduce build flags
into “mvn release” cross versions. Hadoop does not have automated release
build because of the exact reason that our project is heavily flag dependent.
It takes some self-discipline to prevent making a mess in the project
continuous integration effort.
I feel Hadoop is moving on the wrong direction in this area. This prompts to
this discussion to see if we can reduce reliance on profiles for artifact
builds. When I was at IBM developing for Hadoop, every Hadoop related project
using maven release plugin to automate the build process in Jenkins. It opened
my eyes on simplify release procedures and reduce the man power to maintain the
build system. Open source Hadoop builds can benefit from reducing the bad
habits and make the build environment lighter maintenance.
Alternatively, I can add a profile called docker. Docker build will be
optional, and we need to change build.apache.org pre-commit builds to add this
flag for trunk projects to ensure Docker artifacts are tested. I think this is
last resort because it encourages the bad habits to spoil developers. Your
reconsideration is appreciated. Thanks
Regards,
Eric
From: Steve Loughran <[email protected]>
Date: Monday, March 18, 2019 at 3:36 AM
To: Eric Yang <[email protected]>
Cc: Hadoop Common <[email protected]>, "[email protected]"
<[email protected]>, Hdfs-dev <[email protected]>, Eric
Badger <[email protected]>, Eric Payne <[email protected]>,
Jonathan Eagles <[email protected]>, Jim Brennan
<[email protected]>, "Elek, Marton" <[email protected]>
Subject: Re: [DISCUSS] Docker build process
I'm not enthusiastic about making the docker build process mandatory. It's bad
enough having to remember to type -DskipShade to avoid a 5-10 minute delay
every time I do a build of a different branch, which I have to do every single
time I change from one PR to another.
I do not see why the docker build needs to be forced onto everyone. If I had a
choice, I'd make every patch which went near hadoop-common to have been
retested against the real object stores of hadoop-aws, hadoop-azure, etc. But I
dont do that and instead get to find out when some change has accidentally
broken those builds (usually artifact updates, bouncy-castle being needed by
miniyarn, etc). I'm OK with that. So why can't the popel who want the docker
build turn it on.
> Do we want to have inline docker build process in maven?
Yes to the build, -1 to it being mandatory on a normal build-the-JARs "mvn
clean install -DskipTests" run.
> If yes, it would be developer’s responsibility to pass -DskipDocker flag to
> skip docker. Docker is mandatory for default build.
see above
> If no, what is the release flow for docker images going to look like?
the -Pdist profile exists for releasing things right now.
On Wed, Mar 13, 2019 at 10:24 PM Eric Yang
<[email protected]<mailto:[email protected]>> wrote:
Hi Hadoop developers,
In the recent months, there were various discussions on creating docker build
process for Hadoop. There was convergence to make docker build process inline
in the mailing list last month when Ozone team is planning new repository for
Hadoop/ozone docker images. New feature has started to add docker image build
process inline in Hadoop build.
A few lessons learnt from making docker build inline in YARN-7129. The build
environment must have docker to have a successful docker build. BUILD.txt
stated for easy build environment use Docker. There is logic in place to
ensure that absence of docker does not trigger docker build. The inline
process tries to be as non-disruptive as possible to existing development
environment with one exception. If docker’s presence is detected, but user
does not have rights to run docker. This will cause the build to fail.
Now, some developers are pushing back on inline docker build process because
existing environment did not make docker build process mandatory. However,
there are benefits to use inline docker build process. The listed benefits are:
1. Source code tag, maven repository artifacts and docker hub artifacts can
all be produced in one build.
2. Less manual labor to tag different source branches.
3. Reduce intermediate build caches that may exist in multi-stage builds.
4. Release engineers and developers do not need to search a maze of build
flags to acquire artifacts.
The disadvantages are:
1. Require developer to have access to docker.
2. Default build takes longer.
There is workaround for above disadvantages by using -DskipDocker flag to avoid
docker build completely or -pl !modulename to bypass subprojects.
Hadoop development did not follow Maven best practice because a full Hadoop
build requires a number of profile and configuration parameters. Some
evolutions are working against Maven design and require fork of separate source
trees for different subprojects and pom files. Maven best practice
(https://dzone.com/articles/maven-profile-best-practices) has explained that do
not use profile to trigger different artifact builds because it will introduce
maven artifact naming conflicts on maven repository using this pattern. Maven
offers flags to skip certain operations, such as -DskipTests
-Dmaven.javadoc.skip=true -pl or -DskipDocker. It seems worthwhile to make
some corrections to follow best practice for Hadoop build.
Some developers have advocated for separate build process for docker images.
We need consensus on the direction that will work best for Hadoop development
community. Hence, my questions are:
Do we want to have inline docker build process in maven?
If yes, it would be developer’s responsibility to pass -DskipDocker flag to
skip docker. Docker is mandatory for default build.
If no, what is the release flow for docker images going to look like?
Thank you for your feedback.
Regards,
Eric