Re: proposed new repository for hadoop/ozone docker images (+update on docker works)

Eric Yang Thu, 31 Jan 2019 11:00:59 -0800

1, 3. There are 38 Apache projects hosting docker images on Docker hub using 
Apache Organization.  By browsing Apache github mirror.  There are only 7 
projects using a separate repository for docker image build.  Popular projects 
official images are not from Apache organization, such as zookeeper, tomcat, 
httpd.  We may not disrupt what other Apache projects are doing, but it looks 
like inline build process is widely employed by majority of projects such as 
Nifi, Brooklyn, thrift, karaf, syncope and others.  The situation seems a bit 
chaotic for Apache as a whole.  However, Hadoop community can decide what is 
best for Hadoop.  My preference is to remove ozone from source tree naming, if 
Ozone is intended to be subproject of Hadoop for long period of time.  This 
enables Hadoop community to host docker images for various subproject without 
having to check out several source tree to trigger a grand build.  However, 
inline build process seems more popular than separated process.  Hence, I 
highly recommend making docker build inline if possible.


2. I think open an INFRA ticket, and there are Jenkins users who can configure 
the job to run on nodes that have Apache repo credential.

4. The docker image name maps to maven project name.  Hence, if it is 
Hadoop-ozone as project name.  The convention automatically follows the maven 
artifact name with option to customize.  I think it is reasonable and it 
automatically tagged with the same maven project version, which minimize 
version number management between maven and docker.

Regards,
Eric

On 1/31/19, 8:59 AM, "Elek, Marton" <e...@apache.org> wrote:

    
    Hi Eric,
    
    Thanks for the answers
    
    1.
    
    > Hadoop-docker-ozone.git source tree naming seems to create a unique
    process for Ozone.
    
    Not at all. We would like to follow the existing practice which is
    established in HADOOP-14898. In HDDS-851 we discussed why we need two
    separated repositories for hadoop/ozone: because the limitation of the
    dockerhub branch/tag mapping.
    
    I am 100% open to switch to use an other approach. I would suggest to
    create a JIRA for that as it requires code modification in the
    docker-hadoop-* branches.
    
    
    2.
    
    > Flagging automated build on dockerhub seems conflicts with Apache
    release policy.
    
    Honestly I don't know. It was discussed in HADOOP-14989 and the
    connected INFRA ticket and there was no arguments against it. Especially
    as we just followed the existing practice and we just followed the
    practice which is started by other projects.
    
    Now I checked again the docker related INFRA tickets it seems that we
    have two other practice since than:
    
     1) build docker image on the jenkins (is it compliant?)
     2) get permission to push to the apache/... from local.
    
    You suggested to the the second one. Do you have more information how is
    it possible? How and who can request permission to push the
    apache/hadoop for example?
    
    
    3.
    
    From one point of view, publishing existing, voted releases in docker
    images is something like to repackage it. But you may have right and
    this is wrong because it should be handled as separated releases.
    
    Do you know any official ASF wiki/doc/mail discussion about managing
    docker images? If not, I would suggest to create a new wiki/doc as it
    seems that we have no clear answer which is the most compliant way to do it.
    
    4.
    
    Thank you the suggestions to use dockerhub/own namespace to stage docker
    images during the build. Sounds good to me. But I also wrote some other
    problems in my previous mail (3 b,c,d), this is is just one (3/a). Do
    you have any suggestion to solve the other problems?
    
     * Updating existing images (for example in case of an ssl bug, rebuild
    all the existing images with exactly the same payload but updated base
    image/os environment)
    
     * Creating image for older releases (We would like to provide images,
    for hadoop 2.6/2.7/2.7/2.8/2.9. Especially for doing automatic testing
    with different versions).
    
    Thanks a lot,
    Marton
    
    
    On 1/30/19 6:50 PM, Eric Yang wrote:
    > Hi Marton,
    > 
    > Hi Marton,
    > 
    > Flagging automated build on dockerhub seems conflicts with Apache release 
policy.  The vote and release process are manual processes of Apache Way.  
Therefore, 3 b)-3 d) improvement will be out of reach unless policy changes.
    > 
    > YARN-7129 is straight forward by using dockerfile-maven-plugin to build 
docker image locally.  It also checks for existence of /var/run/docker.sock to 
ensure docker is running.  This allows the docker image to build in developer 
sandbox, if the developer sandbox mounts the host /var/run/docker.sock.  Maven 
deploy can configure repository location and authentication credential using 
~/.docker/config.json and maven settings.xml.  This can upload release 
candidate image to release manager's dockerhub account for release vote.  Once 
the vote passes, the image can be pushed to Apache official dockerhub 
repository by release manager or an Apache Jenkin job to tag the image and push 
to Apache account.
    > 
    > Ozone image and application catalog image are in similar situation that 
test image can be built and tested locally.  The official voted artifacts can 
be uploaded to Apache dockerhub account.  Hence, less variant of the same 
procedure will be great.  Hadoop-docker-ozone.git source tree naming seems to 
create a unique process for Ozone.  I think it would be preferable to call the 
Hadoop-docker.git that comprise all docker image builds or 
dockerfile-maven-plugin approach.
    > 
    > Regards,
    > Eric
    > 
    > On 1/30/19, 12:56 AM, "Elek, Marton" <e...@apache.org> wrote:
    > 
    >     Thanks Eric the suggestions.
    >     
    >     Unfortunately (as Anu wrote it) our use-case is slightly different.
    >     
    >     It was discussed in HADOOP-14898 and HDDS-851 but let me summarize the
    >     motivation:
    >     
    >     We would like to upload containers to the dockerhub for each releases
    >     (eg: apache/hadoop:3.2.0)
    >     
    >     According to the Apache release policy, it's not allowed, to publish
    >     snapshot builds (=not voted by PMC) outside of the developer 
community.
    >     
    >     1. We started to follow the pattern which is used by other Apache
    >     projects: docker containers are just different packaging of the 
already
    >     voted binary releases. Therefore we create the containers from the 
voted
    >     releases. (See [1] as an example)
    >     
    >     2. With separating the build of the source code and the docker image 
we
    >     get additional benefits: for example we can rebuild the images in case
    >     of a security problem in the underlying container OS. This is just a 
new
    >     empty commit on the branch and the original release will be 
repackaged.
    >     
    >     3. Technically it would be possible to add the Dockerfile to the 
source
    >     tree and publish the docker image together with the release by the
    >     release manager but it's also problematic:
    >     
    >       a) there is no easy way to stage the images for the vote
    >       b) we have no access to the apache dockerhub credentials
    >       c) it couldn't be flagged as automated on dockerhub
    >       d) It couldn't support the critical updates as I wrote in (2.).
    >     
    >     So the easy way what we found is ask INFRA to register a branch to the
    >     dockerhub to use for the image creation. The build/packaging will be
    >     done by the dockerhub but only released artifacts will be included.
    >     Because the limitation of the dockerhub to set a map between branch
    >     names and tags, we need a new repository instead of the branch (see 
the
    >     comments in HDDS-851 for more details).
    >     
    >     We also have a different use case to build developer images to create 
a
    >     test cluster. These images will never be uploaded to the hub. We have 
a
    >     Dokcerfile in the source tree for this use case (see HDDS-872). And
    >     thank you very much the hint, I will definitely check how YARN-7129 
can
    >     do it and will try to learn from it.
    >     
    >     Thanks,
    >     Marton
    >     
    >     
    >     [1]: https://github.com/apache/hadoop/tree/docker-hadoop-3
    >     
    >     
    >     
    >     On 1/30/19 2:50 AM, Anu Engineer wrote:
    >     > Marton please correct me I am wrong, but I believe that without 
this branch it is hard for us to push to Apache DockerHub. This allows for 
Apache account integration and dockerHub.
    >     > Does YARN publish to the Docker Hub via Apache account?
    >     > 
    >     > 
    >     > Thanks
    >     > Anu
    >     > 
    >     > 
    >     > On 1/29/19, 4:54 PM, "Eric Yang" <ey...@hortonworks.com> wrote:
    >     > 
    >     >     By separating Hadoop docker related build into a separate git 
repository have some slippery slope.  It is harder to synchronize the changes 
between two separate source trees.  There is multi-steps process to build jar, 
tarball, and docker images.  This might be problematic to reproduce.
    >     >     
    >     >     It would be best to arrange code such that docker image build 
process can be invoked as part of maven build process.  The profile is 
activated only if docker is installed and running on the environment.  This 
allows to produce jar, tarball, and docker images all at once without hindering 
existing build procedure.
    >     >     
    >     >     YARN-7129 is one of the examples that making a subproject in 
YARN to build a docker image that can run in YARN.  It automatically detects 
presence of docker and build docker image when docker is available.  If docker 
is not running, the subproject skips and proceed to next sub-project.  Please 
try out YARN-7129 style of build process, and see this is a possible solution 
to solve docker image generation issue?  Thanks
    >     >     
    >     >     Regards,
    >     >     Eric
    >     >     
    >     >     On 1/29/19, 3:44 PM, "Arpit Agarwal" 
<aagar...@cloudera.com.INVALID> wrote:
    >     >     
    >     >         I’ve requested a new repo hadoop-docker-ozone.git in gitbox.
    >     >         
    >     >         
    >     >         > On Jan 22, 2019, at 4:59 AM, Elek, Marton 
<e...@apache.org> wrote:
    >     >         > 
    >     >         > 
    >     >         > 
    >     >         > TLDR;
    >     >         > 
    >     >         > I proposed to create a separated git repository for ozone 
docker images
    >     >         > in HDDS-851 (hadoop-docker-ozone.git)
    >     >         > 
    >     >         > If there is no objections in the next 3 days I will ask 
an Apache Member
    >     >         > to create the repository.
    >     >         > 
    >     >         > 
    >     >         > 
    >     >         > 
    >     >         > LONG VERSION:
    >     >         > 
    >     >         > In HADOOP-14898 multiple docker containers and helper 
scripts are
    >     >         > created for Hadoop.
    >     >         > 
    >     >         > The main goal was to:
    >     >         > 
    >     >         > 1.) help the development with easy-to-use docker images
    >     >         > 2.) provide official hadoop images to make it easy to 
test new features
    >     >         > 
    >     >         > As of now we have:
    >     >         > 
    >     >         > - apache/hadoop-runner image (which contains the required 
dependency
    >     >         > but no hadoop)
    >     >         > - apache/hadoop:2 and apache/hadoop:3 images (to try out 
latest hadoop
    >     >         > from 2/3 lines)
    >     >         > 
    >     >         > The base image to run hadoop (apache/hadoop-runner) is 
also heavily used
    >     >         > for Ozone distribution/development.
    >     >         > 
    >     >         > The Ozone distribution contains docker-compose based 
cluster definitions
    >     >         > to start various type of clusters and scripts to do 
smoketesting. (See
    >     >         > HADOOP-16063 for more details).
    >     >         > 
    >     >         > Note: I personally believe that these definitions help a 
lot to start
    >     >         > different type of clusters. For example it could be 
tricky to try out
    >     >         > router based federation as it requires multiple HA 
clusters. But with a
    >     >         > simple docker-compose definition [1] it could be started 
under 3
    >     >         > minutes. (HADOOP-16063 is about creating these 
definitions for various
    >     >         > hdfs/yarn use cases)
    >     >         > 
    >     >         > As of now we have dedicated branches in the hadoop git 
repository for
    >     >         > the docker images (docker-hadoop-runner, docker-hadoop-2,
    >     >         > docker-hadoop-3). It turns out that a separated 
repository would be more
    >     >         > effective as the dockerhub can use only full branch names 
as tags.
    >     >         > 
    >     >         > We would like to provide ozone docker images to make the 
evaluation as
    >     >         > easy as 'docker run -d apache/hadoop-ozone:0.3.0', 
therefore in HDDS-851
    >     >         > we agreed to create a separated repository for the 
hadoop-ozone docker
    >     >         > images.
    >     >         > 
    >     >         > If this approach works well we can also move out the 
existing
    >     >         > docker-hadoop-2/docker-hadoop-3/docker-hadoop-runner 
branches from
    >     >         > hadoop.git to an other separated hadoop-docker.git 
repository)
    >     >         > 
    >     >         > Please let me know if you have any comments,
    >     >         > 
    >     >         > Thanks,
    >     >         > Marton
    >     >         > 
    >     >         > 1: see
    >     >         > 
https://github.com/flokkr/runtime-compose/tree/master/hdfs/routerfeder
    >     >         > as an example
    >     >         > 
    >     >         > 
---------------------------------------------------------------------
    >     >         > To unsubscribe, e-mail: 
hdfs-dev-unsubscr...@hadoop.apache.org
    >     >         > For additional commands, e-mail: 
hdfs-dev-h...@hadoop.apache.org
    >     >         > 
    >     >         
    >     >         
    >     >         
---------------------------------------------------------------------
    >     >         To unsubscribe, e-mail: 
hdfs-dev-unsubscr...@hadoop.apache.org
    >     >         For additional commands, e-mail: 
hdfs-dev-h...@hadoop.apache.org
    >     >         
    >     >         
    >     >     
    >     >     
    >     >     
---------------------------------------------------------------------
    >     >     To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
    >     >     For additional commands, e-mail: 
common-dev-h...@hadoop.apache.org
    >     >     
    >     > 
    >     
    >     
    > 
    > 
    > ---------------------------------------------------------------------
    > To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
    > For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
    >

Re: proposed new repository for hadoop/ozone docker images (+update on docker works)

Reply via email to