[ https://issues.apache.org/jira/browse/HDDS-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16837201#comment-16837201 ]
Elek, Marton commented on HDDS-1495: ------------------------------------ bq. Anu Engineer I put up a design document to ensure that apache/ozone docker image is reproducible using Apache source code. Please note that apache/ozone images are reproducible even now based on the voted and approved artifacts. I like in the current approach as the dockerhub automated build guarantees that the builds are reproducible, there is a link to the source repository and the hadoop version shows the exact release version. bq. The current arrangement depends on a third party source code to build apache/hadoop-runner image is unnatural and subject to GPL terms because it includes byteman. Put on my Apache member hat, the technical and political deficiency must be addressed in Ozone community source code. 1. AFAIK byteman is LGPL and not GPL. IANAL but LGPL can be linked (it's category x) and byteman is used just as a runtime tool (java agent) the source code/ozone doesn't depend on byteman in any way. But I agree to remove. We may need a LEGAL confirmation if it's a violation. But again: I agree, it's an important question and thanks to start the discussion about it. 2. This is the repository of the hadoop-runner image: https://github.com/apache/hadoop-docker-ozone This is part of the hadoop project. The creation of the repository was discussed not just in the related jira but on the dev mailing list: https://mail-archives.apache.org/mod_mbox/hadoop-hdfs-dev/201901.mbox/%3Cc0bce845-34b3-e8d3-e438-77f8a0c28be0%40apache.org%3E bq. Elek, Marton k8s-dev profiles doesn't exist until a few days ago that I point out the deficiency in Hadoop binary tarball creation. Please don't try to build everything on your own. It would not be productive for the community. 1. First of all I apologize if you feel the creation of the k8s-dev profile is just an offensive step as a result of this discussion. It was definitely not the intention and sorry if it seemed to be a message. 2. k8s-dev profile creation was part of a longer process. Originally we started to use skaffold but it turned it's not flexible enough to support multiple different environment. See HDDS-1412 and related jiras especially HDDS-1384 and HDDS-829. 3. I like to work together with others. I know that I have strength and weaknesses. And I enjoy when others help me to avoid my mistakes. 4. I am reading all of the comments during a debate and trying to learn from arguments of others (including yours). I am pretty sure that my view is modified by your ideas. Please don't blame me for this one... 5. For example I think the k8s-dev and k8s-dev-push profile names are not very good. I think I am influenced by your idea and I think it should be docker-image-build and docker-push. Maybe in an other place, but I think we need one place where we create the _development docker images and it should be usable both by kubernetes and any other components. bq. I also ran into a secondary issue that smoketest tries to download multi-gigabytes of third party docker images. It caused my vm development environment to run out of space (with 20 of 50GB free). I waited 45 minutes for the download, and have to spend half day to try to clean up the debris. I don't think bring in the entire Hadoop ecosystem to run test on one machine is the correct way to perform development. I would happily pay the 30 seconds initial wait time to build locally than 45 minutes download time of third party flokkr/hadoop images that we have no access to make modification. 1. Download time: Please open a separated issue for this. I agree that we need to improve this page. I have same experiments to reduce the size of hadoop images (for example with removing docs + aws dependencies) 2. 3rd party image: flokkr/hadoop is the only way to use older hadoop versions. As of now we have only apache/hadoop:2 (latest release from 2.x branch) and apache//hadoop:3.x images. I agree that we should improve it. I think the easiest way to generate more apache/hadoop images with the current approach. This is blocked by https://issues.apache.org/jira/browse/HADOOP-16092 This is opened based on your comment and I asked your help in the last comment. 3. While this is true that we can use apache/hadoop instead of flokkr/hadoop, this is not true for all the dependencies. We couldn't create docker images for all the downstream projects (spark, hive, tensorflow) but we need them to create scripted compatibility tests. I think it's acceptable to use 3rdparty docker images. (not for hadoop) > Create hadoop/ozone docker images with inline build process > ----------------------------------------------------------- > > Key: HDDS-1495 > URL: https://issues.apache.org/jira/browse/HDDS-1495 > Project: Hadoop Distributed Data Store > Issue Type: New Feature > Reporter: Elek, Marton > Assignee: Eric Yang > Priority: Major > Attachments: HADOOP-16091.001.patch, HADOOP-16091.002.patch, > HDDS-1495.003.patch, HDDS-1495.004.patch, HDDS-1495.005.patch, > HDDS-1495.006.patch, HDDS-1495.007.patch, Hadoop Docker Image inline build > process.pdf > > > This is proposed by [~eyang] in > [this|https://lists.apache.org/thread.html/33ac54bdeacb4beb023ebd452464603aaffa095bd104cb43c22f484e@%3Chdfs-dev.hadoop.apache.org%3E] > mailing thread. > {quote}1, 3. There are 38 Apache projects hosting docker images on Docker hub > using Apache Organization. By browsing Apache github mirror. There are only 7 > projects using a separate repository for docker image build. Popular projects > official images are not from Apache organization, such as zookeeper, tomcat, > httpd. We may not disrupt what other Apache projects are doing, but it looks > like inline build process is widely employed by majority of projects such as > Nifi, Brooklyn, thrift, karaf, syncope and others. The situation seems a bit > chaotic for Apache as a whole. However, Hadoop community can decide what is > best for Hadoop. My preference is to remove ozone from source tree naming, if > Ozone is intended to be subproject of Hadoop for long period of time. This > enables Hadoop community to host docker images for various subproject without > having to check out several source tree to trigger a grand build. However, > inline build process seems more popular than separated process. Hence, I > highly recommend making docker build inline if possible. > {quote} > The main challenges are also discussed in the thread: > {code:java} > 3. Technically it would be possible to add the Dockerfile to the source > tree and publish the docker image together with the release by the > release manager but it's also problematic: > {code} > a) there is no easy way to stage the images for the vote > c) it couldn't be flagged as automated on dockerhub > d) It couldn't support the critical updates. > * Updating existing images (for example in case of an ssl bug, rebuild > all the existing images with exactly the same payload but updated base > image/os environment) > * Creating image for older releases (We would like to provide images, > for hadoop 2.6/2.7/2.7/2.8/2.9. Especially for doing automatic testing > with different versions). > {code:java} > {code} > The a) can be solved (as [~eyang] suggested) with using a personal docker > image during the vote and publish it to the dockerhub after the vote (in case > the permission can be set by the INFRA) > Note: based on LEGAL-270 and linked discussion both approaches (inline build > process / external build process) are compatible with the apache release. > Note: HDDS-851 and HADOOP-14898 contains more information about these > problems. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org