[ 
https://issues.apache.org/jira/browse/HADOOP-13397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16017427#comment-16017427
 ] 

Elek, Marton commented on HADOOP-13397:
---------------------------------------

I am interested about the docker images as I plan to create additional getting 
started tutorials based on docker images.

I tested mkhdf and it worked well. I also have experiences woth my own docker 
images: I am running hadoop/spark/hbase and other clusters with docker images 
where every service is in a separated container. (see 
http://github.com/elek/bigdata-docker/ if you interested)

I suggest to split this jira as (as I see) there are two parts:

1. one side is the role of mkhdf: which could create a selfcontained customized 
Dockerfile according to the parameters

2. I think, it's a separated task to create (or generate with mkhdf) one exact 
Dockerfile, commit it to a new branch in the hadoop git repository and ask 
INFRA to register new branch to the dockerhub.

My proposal to the second one is here: 
https://github.com/elek/hadoop/tree/docker-2.8.0

The example to use is here:

https://github.com/elek/hadoop/blob/docker-2.8.0/example/docker-compose.yaml

As you can see everything could be configured with environment variables, thx 
to a simple script which converts the environment variables to hadoop xml (and 
other property) format.

I would be happy to contribute this type of configuration loading to the mkhdf 
as a separated module. But as I wrote, I think it two things and with creating 
two separated jira, I think we can create apache/hadoop images even without 
blocking on the mkhdf script.

> Add dockerfile for Hadoop
> -------------------------
>
>                 Key: HADOOP-13397
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13397
>             Project: Hadoop Common
>          Issue Type: Bug
>            Reporter: Klaus Ma
>            Assignee: Allen Wittenauer
>         Attachments: HADOOP-13397.DNC001.patch
>
>
> For now, there's no community version Dockerfile in Hadoop; most of docker 
> images are provided by vendor, e.g. 
> 1. Cloudera's image: https://hub.docker.com/r/cloudera/quickstart/
> 2.  From HortonWorks sequenceiq: 
> https://hub.docker.com/r/sequenceiq/hadoop-docker/
> 3. MapR provides the mapr-sandbox-base: 
> https://hub.docker.com/r/maprtech/mapr-sandbox-base/
> The proposal of this JIRA is to provide a community version Dockerfile in 
> Hadoop, and here's some requirement:
> 1. Seperated docker image for master & agents, e.g. resource manager & node 
> manager
> 2. Default configuration to start master & agent instead of configurating 
> manually
> 3. Start Hadoop process as no-daemon
> Here's my dockerfile to start master/agent: 
> https://github.com/k82cn/outrider/tree/master/kubernetes/imgs/yarn
> I'd like to contribute it after polishing :).
> Email Thread : 
> http://mail-archives.apache.org/mod_mbox/hadoop-user/201607.mbox/%3CSG2PR04MB162977CFE150444FA022510FB6370%40SG2PR04MB1629.apcprd04.prod.outlook.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to