[jira] [Commented] (HDDS-1495) Create hadoop/ozone docker images with inline build process

Elek, Marton (JIRA) Fri, 10 May 2019 04:17:26 -0700


    [ 
https://issues.apache.org/jira/browse/HDDS-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16837201#comment-16837201
 ]


Elek, Marton commented on HDDS-1495:
------------------------------------

bq. Anu Engineer I put up a design document to ensure that apache/ozone docker 
image is reproducible using Apache source code. 

Please note that apache/ozone images are reproducible even now based on the 
voted and approved artifacts. I like in the current approach as the dockerhub 
automated build guarantees that the builds are reproducible, there is a link to 
the source repository and the hadoop version shows the exact release version.

bq. The current arrangement depends on a third party source code to build 
apache/hadoop-runner image is unnatural and subject to GPL terms because it 
includes byteman. Put on my Apache member hat, the technical and political 
deficiency must be addressed in Ozone community source code.

1. AFAIK byteman is LGPL and not GPL. IANAL but LGPL can be linked (it's 
category x) and byteman is used just as a runtime tool (java agent) the source 
code/ozone doesn't depend on byteman in any way. But I agree to remove. We may 
need a LEGAL confirmation if it's a violation. But again: I agree, it's an 
important question and thanks to start the discussion about it.

2. This is the repository of the hadoop-runner image: 

https://github.com/apache/hadoop-docker-ozone   

This is part of the hadoop project.

The creation of the repository was discussed not just in the related jira but 
on the dev mailing list:

https://mail-archives.apache.org/mod_mbox/hadoop-hdfs-dev/201901.mbox/%3Cc0bce845-34b3-e8d3-e438-77f8a0c28be0%40apache.org%3E

bq. Elek, Marton k8s-dev profiles doesn't exist until a few days ago that I 
point out the deficiency in Hadoop binary tarball creation. Please don't try to 
build everything on your own. It would not be productive for the community. 

1. First of all I apologize if you feel the creation of the k8s-dev profile is 
just an offensive step as a result of this discussion. It was definitely not 
the intention and sorry if it seemed to be a message.

2. k8s-dev profile creation was part of a longer process. Originally we started 
to use skaffold but it turned it's not flexible enough to support multiple 
different environment. See HDDS-1412 and related jiras especially HDDS-1384 and 
HDDS-829.

3. I like to work together with others. I know that I have strength and 
weaknesses. And I enjoy when others help me to avoid my mistakes. 

4. I am reading all of the comments during a debate and trying to learn from 
arguments of others (including yours). I am pretty sure that my view is 
modified by your ideas. Please don't blame me for this one...

5. For example I think the k8s-dev and k8s-dev-push profile names are not very 
good. I think I am influenced by your idea and I think it should be 
docker-image-build and docker-push. Maybe in an other place, but I think we 
need one place where we create the _development docker images and it should be 
usable both by kubernetes and any other components.

bq. I also ran into a secondary issue that smoketest tries to download 
multi-gigabytes of third party docker images. It caused my vm development 
environment to run out of space (with 20 of 50GB free). I waited 45 minutes for 
the download, and have to spend half day to try to clean up the debris. I don't 
think bring in the entire Hadoop ecosystem to run test on one machine is the 
correct way to perform development. I would happily pay the 30 seconds initial 
wait time to build locally than 45 minutes download time of third party 
flokkr/hadoop images that we have no access to make modification.

1. Download time:  Please open a separated issue for this. I agree that we need 
to improve this page. I have same experiments to reduce the size of hadoop 
images (for example with removing docs + aws dependencies) 

2. 3rd party image: flokkr/hadoop is the only way to use older hadoop versions. 
As of now we have only apache/hadoop:2 (latest release from 2.x branch) and 
apache//hadoop:3.x images.

I agree that we should improve it. I think the easiest way to generate more 
apache/hadoop images with the current approach. This is blocked by 
https://issues.apache.org/jira/browse/HADOOP-16092 This is opened based on your 
comment and I asked your help in the last comment.

3. While this is true that we can use apache/hadoop instead of flokkr/hadoop, 
this is not true for all the dependencies. We couldn't create docker images for 
all the downstream projects (spark, hive, tensorflow) but we need them to 
create scripted compatibility tests. I think it's acceptable to use 3rdparty 
docker images. (not for hadoop)

> Create hadoop/ozone docker images with inline build process
> -----------------------------------------------------------
>
>                 Key: HDDS-1495
>                 URL: https://issues.apache.org/jira/browse/HDDS-1495
>             Project: Hadoop Distributed Data Store
>          Issue Type: New Feature
>            Reporter: Elek, Marton
>            Assignee: Eric Yang
>            Priority: Major
>         Attachments: HADOOP-16091.001.patch, HADOOP-16091.002.patch, 
> HDDS-1495.003.patch, HDDS-1495.004.patch, HDDS-1495.005.patch, 
> HDDS-1495.006.patch, HDDS-1495.007.patch, Hadoop Docker Image inline build 
> process.pdf
>
>
> This is proposed by [~eyang] in 
> [this|https://lists.apache.org/thread.html/33ac54bdeacb4beb023ebd452464603aaffa095bd104cb43c22f484e@%3Chdfs-dev.hadoop.apache.org%3E]
>  mailing thread.
> {quote}1, 3. There are 38 Apache projects hosting docker images on Docker hub 
> using Apache Organization. By browsing Apache github mirror. There are only 7 
> projects using a separate repository for docker image build. Popular projects 
> official images are not from Apache organization, such as zookeeper, tomcat, 
> httpd. We may not disrupt what other Apache projects are doing, but it looks 
> like inline build process is widely employed by majority of projects such as 
> Nifi, Brooklyn, thrift, karaf, syncope and others. The situation seems a bit 
> chaotic for Apache as a whole. However, Hadoop community can decide what is 
> best for Hadoop. My preference is to remove ozone from source tree naming, if 
> Ozone is intended to be subproject of Hadoop for long period of time. This 
> enables Hadoop community to host docker images for various subproject without 
> having to check out several source tree to trigger a grand build. However, 
> inline build process seems more popular than separated process. Hence, I 
> highly recommend making docker build inline if possible.
> {quote}
> The main challenges are also discussed in the thread:
> {code:java}
> 3. Technically it would be possible to add the Dockerfile to the source
> tree and publish the docker image together with the release by the
> release manager but it's also problematic:
> {code}
> a) there is no easy way to stage the images for the vote
>  c) it couldn't be flagged as automated on dockerhub
>  d) It couldn't support the critical updates.
>  * Updating existing images (for example in case of an ssl bug, rebuild
>  all the existing images with exactly the same payload but updated base
>  image/os environment)
>  * Creating image for older releases (We would like to provide images,
>  for hadoop 2.6/2.7/2.7/2.8/2.9. Especially for doing automatic testing
>  with different versions).
> {code:java}
>  {code}
> The a) can be solved (as [~eyang] suggested) with using a personal docker 
> image during the vote and publish it to the dockerhub after the vote (in case 
> the permission can be set by the INFRA)
> Note: based on LEGAL-270 and linked discussion both approaches (inline build 
> process / external build process) are compatible with the apache release.
> Note: HDDS-851 and HADOOP-14898 contains more information about these 
> problems.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDDS-1495) Create hadoop/ozone docker images with inline build process

Reply via email to