[ 
https://issues.apache.org/jira/browse/HDDS-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16869682#comment-16869682
 ] 

Eric Yang commented on HDDS-1495:
---------------------------------

{quote}Current approach: no tarball creation is required to test everything in 
docker{quote}

The current approach is risky from reproducibility point of view.  This creates 
a directory structure of Ozone home that allows user to play in the sandbox 
without tarball creation.  By making this two stage process, user can 
contaminated Ozone sandbox directory by running test or other operations.  
Hence, the resulting tarball creation may contain additional debris, if user 
allowed to play in the sandbox then run mvn package -Pdist without using clean.

If mvn clean package -Pdist is performed, user still have to spend the extra 
time to create tarball, then play with the sandbox.  By taking the tarball 
snapshot first, it is much safer to ensure no contamination of test result, or 
pyc files get included in tarball.

{quote}Proposed approach: tarball creation (which is slow) is REQUIRED to run 
docker tests{quote}

This is for sanitization to ensure user doesn't include addition debris when 
docker build is happening.  The tarball creation is a out of band process.  
User can build tarball once, and focus on refining docker image without ever 
need to recreate the tarball when tarball requires no change.  5-7 seconds 
spent to include tarball binary in docker image ensure the test is carry out on 
docker image that closely represent distributed environment.  Binary external 
of docker image creates many risk and limitations that hindered the process of 
testing docker image in real distributed environment.

{quote}+1: the proposed approach has additional IO overhead, not just the tar 
file creation (copy tarball to the .m2/repository){quote}

This is a user choice.  User can choose to mvn install to populate .m2 cache, 
or use mvn package from the top level of Hadoop project.  Mvn package doesn't 
generate the addition IO overhead and keep output in build directory.  Maven 
designed the build system this way to enable user to produce modular component 
artifacts and work in small units without a full build.  A well written maven 
project structure allows build from any of the submodules, which Hadoop project 
has the pride and joy to maintain this for 7+ years.  Ozone dist project 
twisted meaning of dist-layout-stitching script is the worst kind of example to 
copy artifacts from hadoop-ozone-objectstore-service-${HDDS_VERSION} modules 
without given user the choice to use cache copy or the current build directory 
copy.  This forces developer to use the only option to build entire Ozone 
project from top level of Hadoop project.  The time wasted on building the 
entire project is what made development non-productive.  I recommend to be more 
proficient in maven before making poor implementation choice for instant 
gratification that are hazard to developers health.

{quote}+2: the proposed approach didn't address the earlier comments (use newer 
base image in case of a security issue){quote}

Maven assembly plugin does address your question on how to build with newer OS 
image with released binary.  Using maven assembly to build tarball, will allow 
deploy released binary tarball to maven central.  There is no need to rebuild 
released Ozone tarball.  User can simply rebuild Docker image using release 
Ozone tarball binary by specifying version of the tarball to build with.   In 
my opinion, modularize docker build is a much better solution than current 
monolithic build that is incapable of reproducing dist project without a full 
build.

{quote}+3: the proposed approach didn't address the other comments (make it 
possible to provide images for earlier hadoop releases){quote}

Maven remote fetching technique can apply to download released Hadoop tarball 
by specifying remote URL to fetch.  This is a standard technique used by almost 
all modern build tools, I don't see technical obstacle in the remote fetching 
solution. Please elaborate on the actual problem than provide blind statement 
that it doesn't work.

> Create hadoop/ozone docker images with inline build process
> -----------------------------------------------------------
>
>                 Key: HDDS-1495
>                 URL: https://issues.apache.org/jira/browse/HDDS-1495
>             Project: Hadoop Distributed Data Store
>          Issue Type: Sub-task
>            Reporter: Elek, Marton
>            Assignee: Eric Yang
>            Priority: Major
>         Attachments: HADOOP-16091.001.patch, HADOOP-16091.002.patch, 
> HDDS-1495.003.patch, HDDS-1495.004.patch, HDDS-1495.005.patch, 
> HDDS-1495.006.patch, HDDS-1495.007.patch, HDDS-1495.008.patch, Hadoop Docker 
> Image inline build process.pdf
>
>
> This is proposed by [~eyang] in 
> [this|https://lists.apache.org/thread.html/33ac54bdeacb4beb023ebd452464603aaffa095bd104cb43c22f484e@%3Chdfs-dev.hadoop.apache.org%3E]
>  mailing thread.
> {quote}1, 3. There are 38 Apache projects hosting docker images on Docker hub 
> using Apache Organization. By browsing Apache github mirror. There are only 7 
> projects using a separate repository for docker image build. Popular projects 
> official images are not from Apache organization, such as zookeeper, tomcat, 
> httpd. We may not disrupt what other Apache projects are doing, but it looks 
> like inline build process is widely employed by majority of projects such as 
> Nifi, Brooklyn, thrift, karaf, syncope and others. The situation seems a bit 
> chaotic for Apache as a whole. However, Hadoop community can decide what is 
> best for Hadoop. My preference is to remove ozone from source tree naming, if 
> Ozone is intended to be subproject of Hadoop for long period of time. This 
> enables Hadoop community to host docker images for various subproject without 
> having to check out several source tree to trigger a grand build. However, 
> inline build process seems more popular than separated process. Hence, I 
> highly recommend making docker build inline if possible.
> {quote}
> The main challenges are also discussed in the thread:
> {code:java}
> 3. Technically it would be possible to add the Dockerfile to the source
> tree and publish the docker image together with the release by the
> release manager but it's also problematic:
> {code}
> a) there is no easy way to stage the images for the vote
>  c) it couldn't be flagged as automated on dockerhub
>  d) It couldn't support the critical updates.
>  * Updating existing images (for example in case of an ssl bug, rebuild
>  all the existing images with exactly the same payload but updated base
>  image/os environment)
>  * Creating image for older releases (We would like to provide images,
>  for hadoop 2.6/2.7/2.7/2.8/2.9. Especially for doing automatic testing
>  with different versions).
> {code:java}
>  {code}
> The a) can be solved (as [~eyang] suggested) with using a personal docker 
> image during the vote and publish it to the dockerhub after the vote (in case 
> the permission can be set by the INFRA)
> Note: based on LEGAL-270 and linked discussion both approaches (inline build 
> process / external build process) are compatible with the apache release.
> Note: HDDS-851 and HADOOP-14898 contains more information about these 
> problems.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to