[
https://issues.apache.org/jira/browse/HADOOP-13397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15388200#comment-15388200
]
Allen Wittenauer commented on HADOOP-13397:
-------------------------------------------
A couple of things:
a) I and I know others as well have some rather large licensing questions
around Docker images. They effectively act as a binary distribution and it is
very much against ASF rules to distribute GPL and other Category X components.
It makes me extremely uncomfortable to move forward without some clarification
from legal. (Yes, I know other ASF projects are publishing images on docker
hub. Hopefully that means that there is a JIRA issue in the LEGAL project to
point to.) This is a blocking issue that really needs to get clarified before
further time investment.
b) I'm going to change the description in this issue from "Official image from
Cloudera" to "Cloudera's image". Cloudera can't make an "official image" for
Apache Hadoop, so let's clear up any potential confusion before it starts.
c) Is this actually useful in reality? The vast vast vast majority of Apache
Hadoop deployments add a wide variety of additional components on top of Apache
Hadoop to the point that even making a base image still seems like it wouldn't
be particularly usable without downstream conflict resolution. It may be useful
to make Dockerfile templates, but full blown images? Hmm.. I'm going to need
some convincing.
d) Upon working with the existing Dockerfile and porting it over to support the
ASF PowerPC build machines (HADOOP-13329) we need to be aware that we're going
to need more than one Dockerfile per hardware platform. We made that mistake
with start-build-env.sh (which we'll fix as part of 13329), but we should avoid
it here. (We've gotten some poking from the ARM64 folks as well.)
e) This is going to hit upon the larger issue of distributed configuration
management, which is going to be extremely tricky to make consumable, never
mind what types of configurations are actually supported: security? persistent
storage? Then there are client configs--which, it's worthwhile pointing out,
not even the vendor tools handle particularly well.
f) I think a much more attainable goal to start is making a single Dockerfile
that runs all of the Apache Hadoop daemons as a single node configuration.
That's a highly desirable thing to have for a variety of reasons. If there is
still heavy interest in breaking it apart, it gives a base working example
before proceeding further to tease out the various daemons.
> Add dockerfile for Hadoop
> -------------------------
>
> Key: HADOOP-13397
> URL: https://issues.apache.org/jira/browse/HADOOP-13397
> Project: Hadoop Common
> Issue Type: Bug
> Reporter: Klaus Ma
>
> For now, there's no community version Dockerfile in Hadoop; most of docker
> images are provided by vendor, e.g.
> 1. Official image from Cloudera is the quickstart image:
> https://hub.docker.com/r/cloudera/quickstart/
> 2. From HortonWorks sequenceiq:
> https://hub.docker.com/r/sequenceiq/hadoop-docker/
> 3. MapR provides the mapr-sandbox-base:
> https://hub.docker.com/r/maprtech/mapr-sandbox-base/
> The proposal of this JIRA is to provide a community version Dockerfile in
> Hadoop, and here's some requirement:
> 1. Seperated docker image for master & agents, e.g. resource manager & node
> manager
> 2. Default configuration to start master & agent instead of configurating
> manually
> 3. Start Hadoop process as no-daemon
> Here's my dockerfile to start master/agent:
> https://github.com/k82cn/outrider/tree/master/kubernetes/imgs/yarn
> I'd like to contribute it after polishing :).
> Email Thread :
> http://mail-archives.apache.org/mod_mbox/hadoop-user/201607.mbox/%3CSG2PR04MB162977CFE150444FA022510FB6370%40SG2PR04MB1629.apcprd04.prod.outlook.com%3E
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]