Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.

The "dineshs/IsolatingYarnAppsInDockerContainers" page has been changed by 
dineshs:
https://wiki.apache.org/hadoop/dineshs/IsolatingYarnAppsInDockerContainers?action=diff&rev1=1&rev2=2

  
  == Motivation ==
  
- The advantages Containers and Docker offer to Hadoop YARN are well 
understood.  Here is a partial list.
+ The advantages Containers and Docker offer to Hadoop YARN are well understood:
  
-  * '''Isolation of software dependencies and configuration'''  With 
applications encapsulated within Docker containers, software dependencies and 
system configuration required for an application can be independently specified 
from that of the host and other applications running on the cluster.
-  * '''Security'''  The privilege scope of a task is limited to the container 
it runs in.  Root in the container would have no root privileges on the host 
for example.  Linux capabilities possessed by the task, devices accessible to 
it etc. can be controlled.
-  * '''Performance isolation'''  Containers provide dynamically tunable limits 
on a task's use of resources such as CPU, memory and IO bandwidth.
-  * '''Consistency'''  All tasks of an application run in an identical 
software environment defined by the container and its image, regardless of the 
state of the host.  For example, an application could run in an Ubuntu 
environment making use of Ubuntu-specific software, while the host itself runs 
RHEL.
-  * '''Quick provisioning'''  The central repository of container images 
decouples software state and configuration from hardware enabling a relatively 
stateless base platform to be rapidly provisioned for a YARN application by 
automatically pulling right container image on demand.
-  * '''Programmability'''  Dockerfiles provide a fast and canonical mechanism 
to produce the file system context and configuration required for a YARN 
application.
+ * '''Security.'''  YARN is typically deployed as a multi-tenant environment 
in large organizations with multiple groups sharing a common IT-managed 
cluster.  Tasks from different tenants could potentially be scheduled on the 
same host. Containers securely isolate those tasks by limiting the privilege 
scope of a task to the container in which it runs.  Root in the container is 
distinct from root on the host.  Even though the root in a container could run 
privileged operations, it only affects the container counterparts of the host 
resources but not the host directly.  Specific Linux Capabilities possessed by 
the task, devices accessible to it, etc. are adjusted for each container.
+ 
+ When combined with Software Defined Networking techniques, containers isolate 
the network traffic of different tenant applications. Then the tasks of one 
customer would not be able to maliciously or unintentionally snoop the traffic 
of another tenant.
+ 
+ * '''Performance isolation.'''  Containers provide resource accounting and 
enforce resource limits on the processes running within them to prevent 
applications from stepping on each other. For fine-grain control, resource 
limits associated with CPU, memory and I/O bandwidth can be tuned on-the-fly as 
decided by the resource manager.
+ 
+ * '''Higher utilization by co-scheduling CPU and I/O bound jobs.'''  In a 
multitenant environment, applications have varying resource needs.  While some 
tasks are compute intensive, others could be I/O-bound.  When the tasks of an 
I/O bound job are scheduled on a node, its compute resources go unused and vice 
versa.  Due to the security risk of co-locating the tasks of different tenants 
on a shared machine, the idle resources are not allocated to other tenants even 
if they are able to utilize them.  Containers prevent such resource 
underutilization by securely isolating tasks from one another, so that they can 
be safely co-scheduled on the same host.
+ 
+ * '''Consistency.'''  Distributed YARN applications consist of tasks that 
need to run on different  cluster nodes deployed with an identical host 
environment.  Any discrepancies may cause application misbehavior.  Containers 
ensure that all the tasks of an application run in a consistent software 
environment defined by the container and its image, regardless of the state of 
the host.  For example, an application could run in an Ubuntu environment 
making use of Ubuntu-specific software, while the host itself runs RHEL.
+ 
+ * '''Isolation of software dependencies and configuration.'''  YARN is 
designed to be modular, with well-defined interfaces between applications and 
its core.  This allows applications to be built as independent binaries which 
often rely on third party software.  For example, an application that predicts 
consumer spending based on linear regression might have a dependency on Matlab. 
 Since the tasks of an application could be potentially scheduled to run on any 
host in the cluster, these software dependencies would have to be installed on 
all the cluster nodes.  A variety of applications all sharing the same YARN 
cluster can quickly clutter the nodes with their respective software 
dependencies. Installing all dependencies across all hosts is an unscalable 
approach.  In some cases, the software dependencies and their versions may be 
mutually conflicting.
+ 
+ With applications encapsulated in Docker containers, software dependencies 
and the system configuration required for them can be specified independent of 
the host and other applications running on the cluster.
+ 
+ * '''Reproducible and programmable mechanism to define application 
environments.'''  Docker supports a mechanism to programmatically build out a 
consistent environment required for YARN applications.  The build process can 
be run offline with its products stored in the central repository of container 
images.  At the time of deployment, the image bits are quickly streamed into 
the cluster without incurring the overhead of runtime configuration.
+ 
+ * '''Rapid provisioning.'''  The central repository of container images 
decouples software state and configuration from the hardware, enabling a 
relatively stateless base platform to be rapidly provisioned for a YARN 
application, by automatically pulling the right container image on demand.  
When the job finishes the containers are simply removed, returning the cluster 
to its pristine state.
  
  == Work items ==
  

Reply via email to