[ 
https://issues.apache.org/jira/browse/SINGA-11?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14966279#comment-14966279
 ] 

ASF subversion and git services commented on SINGA-11:
------------------------------------------------------

Commit 1f513ec1e6ea75c8b1dcd0582022a7669bab595e in incubator-singa's branch 
refs/heads/master from [~ug93tad]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-singa.git;h=1f513ec ]

SINGA-89  Add Docker support

Docker is a Linux container which is fast to deploy and incurs small overhead. 
A Docker container functions like a light-weight virtual machine and runs in 
its own isolated environment. There are at least two benefits in building 
adding Docker support for SINGA:

+ Out of the box deployment of SINGA: the image, once built, contains the 
complete environment necessary to start SINGA.

+ Light-weight development and testing environment for distributed features. 
The image can be used to set up a test-bed consisting of many independent nodes 
(each node has its own IP address), without the need for cluster hardware.

We add to Github a Dockerfile specifying how to build SINGA image, from which 
the user can construct the image by executing:

sudo docker build -t singa/base .

The user can choose another name beside singa/base . The build process can take 
a long time, but it needs to be done only at one host and the result image can 
then be copied to other hosts.

We also add another Dockerfile which adds Mesos and Hadoop ontop of SINGA. This 
is closely related to SINGA-11. The image created from this Dockerfile is used 
to set up the distributed test bed.

See the README.md for more details of the images and how to use them.

IMPORTANT We assume that every host has Docker running. Two nodes on the epiC 
cluster (`ciidaa-c18` and `ciidaa-c19`) are set up with Docker and the 
pre-built images.

IMPORTANT For now, the Dockerfile for building Mesos+Hadoop+SINGA will pull the 
latest from ug93tad's (Anh's) SINGA-11 branch. This pull step is removed after 
SINGA-11 is merged to the project's master branch.


> Start SINGA on Apache Mesos
> ---------------------------
>
>                 Key: SINGA-11
>                 URL: https://issues.apache.org/jira/browse/SINGA-11
>             Project: Singa
>          Issue Type: New Feature
>            Reporter: wangwei
>            Assignee: Anh Dinh
>
> Apache Mesos is a fine-grained cluster management framework which enables 
> resource sharing in the same cluster. Mesos abstracts out the physical 
> configurations of cluster nodes, and presents resources to the users in the 
> form of "offers". SINGA uses Mesos for two purposes:
> # To acquire necessary resources for training the model.
> # To launch and monitor progress of the training task.
> To these ends, we implement a {{SINGA Scheduler}} which interacts with Mesos 
> master. The scheduler is called when the user wants to start a new SINGA job, 
> and it performs the following steps:
> # Read the job configuration file to determine necessary resources in terms 
> of CPUs, memory and storage.
> # Wait for resource offers from the Mesos master.
> # Determine if the offers meet the requirement of resources.
> # Prepare the task to launch at each slave:
> #* Deliver the job configuration file to the slave node.
> #* Specify the command to run on the slave:
> {code}
> singa -conf ./job.conf
> {code}
> #* Launch and monitor the progress
> For step 3, we currently implement a simple scheme: the number of CPUs 
> offered by each Mesos slave exceed the total number of SINGA worker and SINGA 
> server per process. In other words, each selected slave must be able to run 
> the entire worker group or server group.
> For step 4, we currently relies on HDFS to deliver the configuration file to 
> each slave. Particularly, we write the file to a known directory (different 
> for each job) on HDFS and ask the
> slave to use its Fetcher utility to download the file before executing the 
> task.
> The development and testing environment for this ticket are created from 
> [SINGA-89|https://issues.apache.org/jira/browse/SINGA-89]
> We will create a {{README.md}} file explaining the steps.
> h5. Important
> We assume that SINGA, Mesos and Hadoop are running at every node. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to