[jira] [Comment Edited] (HADOOP-16091) Create hadoop/ozone docker images with inline build process

Elek, Marton (JIRA) Mon, 06 May 2019 02:42:21 -0700


    [ 
https://issues.apache.org/jira/browse/HADOOP-16091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16833654#comment-16833654
 ]


Elek, Marton edited comment on HADOOP-16091 at 5/6/19 9:41 AM:
---------------------------------------------------------------

bq. This is how maven is designed to allow each sub-module to build 
independently. This allows reducing iteration time on each component instead of 
doing the full build each time. The k8s-dev solution has a conflict of 
interests in maven design. Part of Maven design is to release one binary per 
project using maven release:release plugin.

I wouldn't like to create an additional thread here, but I think 
release:release goal is not part of the fundamental design of maven. This is 
just a maven plugin which can be replaced better release plugin or other 
processes. (I would say that life-cycle/goal bindings or profiles are part of 
the design)

BTW I think the "release:release" plugin has some design problems, but it's an 
other story (But I prefer to not use it, for example because the created tags 
are pushed too early).

bq. By using the tar layout stitching temp space, it saves space during the 
build. However, it creates a inseparable process for building tarball and 
docker in maven because the temp directory is not in maven cache. This means 
the tarball and docker image must be built together, and only one of them can 
be deposited to maven repository. Hence, it takes more time to reiterate just 
the docker part. It is not good for developer that only work on docker and not 
the tarball. 

I am not sure if I understood your concern but I think tar file creation and 
docker file creation can be easily separated by moving the tar file creation to 
the dist profile and keep k8s-dev profile as is. I am +1 for this suggestion.

Do you have any other problem with the k8s-dev approach?

bq. Symlink can be used to make /opt/ozone > /opt/ozone-${project.version}. 
This is the practice that Hadoop use to avoid versioned directory while 
maintain ability to swap binaries. We should keep symlink practice for config 
files to reference version neutral location. I think we have agreement on the 
base image. This also allow us to use RUN directive to make any post tarball 
process required in docker build.

Let me more precious: Hadoop doesn't use symlinks AFAIK. Ambari, bigtop and 
Hortonworks/Cloudera distributions use symlinks to manage multiple versions of 
hadoop.

Sorry if it seems to be pedant. I learned from [Eugenia 
Cheng|http://eugeniacheng.com/math/books/] that the difference between pedantry 
and precision is illumination. I wrote it just because I think it's very 
important that the symlinks are introduced to manage version in *on-prem* 
clusters.

I think the containerized word is different. For multiple versions we need to 
use different containers therefore we don't need to add version *inside* the 
containers any more.

Usually I don't think the examples are good arguments (I think it's more 
important to find the right solution instead of following existing practices) 
but I checked spark images (which can be created by bin/docker-image-tool.sh 
from the spark distribution) and they also use /opt/spark. (But I a fine to use 
/opt/apache/ozone if you prefer it. I like the apache subdir.)


was (Author: elek):

bq. This is how maven is designed to allow each sub-module to build 
independently. This allows reducing iteration time on each component instead of 
doing the full build each time. The k8s-dev solution has a conflict of 
interests in maven design. Part of Maven design is to release one binary per 
project using maven release:release plugin.

I wouldn't like to create an additional thread here, but I think 
release:release goal is not part of the fundamental design of maven. This is 
just a maven plugin which can be replaced better release plugin or other 
processes. (I would say that life-cycle/goal bindings or profiles are part of 
the design)

BTW I think the "release:release" plugin has some design problems, but it's an 
other story (But I prefer to not use it, for example because the created tags 
are pushed too early).

bq. By using the tar layout stitching temp space, it saves space during the 
build. However, it creates a inseparable process for building tarball and 
docker in maven because the temp directory is not in maven cache. This means 
the tarball and docker image must be built together, and only one of them can 
be deposited to maven repository. Hence, it takes more time to reiterate just 
the docker part. It is not good for developer that only work on docker and not 
the tarball. 

I am not sure if I understood your concern but I think tar file creation and 
docker file creation can be easily separated by moving the tar file creation to 
the dist profile and keep k8s-dev profile as is. I am +1 for this suggestion.

Do you have any other problem with the k8s-dev approach?

bq. Symlink can be used to make /opt/ozone > /opt/ozone-${project.version}. 
This is the practice that Hadoop use to avoid versioned directory while 
maintain ability to swap binaries. We should keep symlink practice for config 
files to reference version neutral location. I think we have agreement on the 
base image. This also allow us to use RUN directive to make any post tarball 
process required in docker build.

Let me more precious: Hadoop doesn't use symlinks AFAIK. Ambari, bigtop and 
Hortonworks/Cloudera distributions use symlinks to manage multiple versions of 
hadoop.

Sorry if it seems to be pedant. I learned from Eugenia Cheng that the 
difference between pedantry and precision is illumination. I wrote it just 
because I think it's very important that the symlinks are introduced to manage 
version in *on-prem* clusters.

I think the containerized word is different. For multiple versions we need to 
use different containers therefore we don't need to add version *inside* the 
containers any more.

Usually I don't think the examples are good arguments (I think it's more 
important to find the right solution instead of following existing practices) 
but I checked spark images (which can be created by bin/docker-image-tool.sh 
from the spark distribution) and they also use /opt/spark. (But I a fine to use 
/opt/apache/ozone if you prefer it. I like the apache subdir.)

> Create hadoop/ozone docker images with inline build process
> -----------------------------------------------------------
>
>                 Key: HADOOP-16091
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16091
>             Project: Hadoop Common
>          Issue Type: New Feature
>            Reporter: Elek, Marton
>            Assignee: Eric Yang
>            Priority: Major
>         Attachments: HADOOP-16091.001.patch, HADOOP-16091.002.patch
>
>
> This is proposed by [~eyang] in 
> [this|https://lists.apache.org/thread.html/33ac54bdeacb4beb023ebd452464603aaffa095bd104cb43c22f484e@%3Chdfs-dev.hadoop.apache.org%3E]
>  mailing thread.
> {quote}1, 3. There are 38 Apache projects hosting docker images on Docker hub 
> using Apache Organization. By browsing Apache github mirror. There are only 7 
> projects using a separate repository for docker image build. Popular projects 
> official images are not from Apache organization, such as zookeeper, tomcat, 
> httpd. We may not disrupt what other Apache projects are doing, but it looks 
> like inline build process is widely employed by majority of projects such as 
> Nifi, Brooklyn, thrift, karaf, syncope and others. The situation seems a bit 
> chaotic for Apache as a whole. However, Hadoop community can decide what is 
> best for Hadoop. My preference is to remove ozone from source tree naming, if 
> Ozone is intended to be subproject of Hadoop for long period of time. This 
> enables Hadoop community to host docker images for various subproject without 
> having to check out several source tree to trigger a grand build. However, 
> inline build process seems more popular than separated process. Hence, I 
> highly recommend making docker build inline if possible.
> {quote}
> The main challenges are also discussed in the thread:
> {code:java}
> 3. Technically it would be possible to add the Dockerfile to the source
> tree and publish the docker image together with the release by the
> release manager but it's also problematic:
> {code}
> a) there is no easy way to stage the images for the vote
>  c) it couldn't be flagged as automated on dockerhub
>  d) It couldn't support the critical updates.
>  * Updating existing images (for example in case of an ssl bug, rebuild
>  all the existing images with exactly the same payload but updated base
>  image/os environment)
>  * Creating image for older releases (We would like to provide images,
>  for hadoop 2.6/2.7/2.7/2.8/2.9. Especially for doing automatic testing
>  with different versions).
> {code:java}
>  {code}
> The a) can be solved (as [~eyang] suggested) with using a personal docker 
> image during the vote and publish it to the dockerhub after the vote (in case 
> the permission can be set by the INFRA)
> Note: based on LEGAL-270 and linked discussion both approaches (inline build 
> process / external build process) are compatible with the apache release.
> Note: HDDS-851 and HADOOP-14898 contains more information about these 
> problems.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-16091) Create hadoop/ozone docker images with inline build process

Reply via email to