http://git-wip-us.apache.org/repos/asf/flink/blob/47070674/docs/ops/deployment/docker.md
----------------------------------------------------------------------
diff --git a/docs/ops/deployment/docker.md b/docs/ops/deployment/docker.md
new file mode 100644
index 0000000..4986f2a
--- /dev/null
+++ b/docs/ops/deployment/docker.md
@@ -0,0 +1,102 @@
+---
+title:  "Docker Setup"
+nav-title: Docker
+nav-parent_id: deployment
+nav-pos: 4
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+[Docker](https://www.docker.com) is a popular container runtime. There are
+official Docker images for Apache Flink available on Docker Hub which can be
+used directly or extended to better integrate into a production environment.
+
+* This will be replaced by the TOC
+{:toc}
+
+## Official Docker Images
+
+The [official Docker repository](https://hub.docker.com/_/flink/) is
+hosted on Docker Hub and serves images of Flink version 1.2.1 and later.
+
+Images for each supported combination of Hadoop and Scala are available, and
+tag aliases are provided for convenience.
+
+For example, the following aliases can be used: *(`1.2.y` indicates the latest
+release of Flink 1.2)*
+
+* `flink:latest` →
+`flink:<latest-flink>-hadoop<latest-hadoop>-scala_<latest-scala>`
+* `flink:1.2` → `flink:1.2.y-hadoop27-scala_2.11`
+* `flink:1.2.1-scala_2.10` → `flink:1.2.1-hadoop27-scala_2.10`
+* `flink:1.2-hadoop26` → `flink:1.2.y-hadoop26-scala_2.11`
+
+<!-- NOTE: uncomment when docker-flink/docker-flink/issues/14 is resolved. -->
+<!--
+Additionally, images based on Alpine Linux are available. Reference them by
+appending `-alpine` to the tag. For the Alpine version of `flink:latest`, use
+`flink:alpine`.
+
+For example:
+
+* `flink:alpine`
+* `flink:1.2.1-alpine`
+* `flink:1.2-scala_2.10-alpine`
+-->
+
+**Note:** The docker images are provided as a community project by individuals
+on a best-effort basis. They are not official releases by the Apache Flink PMC.
+
+## Flink with Docker Compose
+
+[Docker Compose](https://docs.docker.com/compose/) is a convenient way to run a
+group of Docker containers locally.
+
+An [example config 
file](https://github.com/docker-flink/examples/blob/master/docker-compose.yml)
+is available on GitHub.
+
+### Usage
+
+* Launch a cluster in the foreground
+
+        docker-compose up
+
+* Launch a cluster in the background
+
+        docker-compose up -d
+
+* Scale the cluster up or down to *N* TaskManagers
+
+        docker-compose scale taskmanager=<N>
+
+When the cluster is running, you can visit the web UI at [http://localhost:8081
+](http://localhost:8081) and submit a job.
+
+To submit a job via the command line, you must copy the JAR to the Jobmanager
+container and submit the job from there.
+
+For example:
+
+{% raw %}
+    $ JOBMANAGER_CONTAINER=$(docker ps --filter name=jobmanager 
--format={{.ID}})
+    $ docker cp path/to/jar "$JOBMANAGER_CONTAINER":/job.jar
+    $ docker exec -t -i "$JOBMANAGER_CONTAINER" flink run /job.jar
+{% endraw %}
+
+{% top %}

http://git-wip-us.apache.org/repos/asf/flink/blob/47070674/docs/ops/deployment/gce_setup.md
----------------------------------------------------------------------
diff --git a/docs/ops/deployment/gce_setup.md b/docs/ops/deployment/gce_setup.md
new file mode 100644
index 0000000..2925737
--- /dev/null
+++ b/docs/ops/deployment/gce_setup.md
@@ -0,0 +1,93 @@
+---
+title:  "Google Compute Engine Setup"
+nav-title: Google Compute Engine
+nav-parent_id: deployment
+nav-pos: 6
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+
+This documentation provides instructions on how to setup Flink fully 
automatically with Hadoop 1 or Hadoop 2 on top of a [Google Compute 
Engine](https://cloud.google.com/compute/) cluster. This is made possible by 
Google's [bdutil](https://cloud.google.com/hadoop/bdutil) which starts a 
cluster and deploys Flink with Hadoop. To get started, just follow the steps 
below.
+
+* This will be replaced by the TOC
+{:toc}
+
+# Prerequisites
+
+## Install Google Cloud SDK
+
+Please follow the instructions on how to setup the [Google Cloud 
SDK](https://cloud.google.com/sdk/). In particular, make sure to authenticate 
with Google Cloud using the following command:
+
+    gcloud auth login
+
+## Install bdutil
+
+At the moment, there is no bdutil release yet which includes the Flink
+extension. However, you can get the latest version of bdutil with Flink support
+from [GitHub](https://github.com/GoogleCloudPlatform/bdutil):
+
+    git clone https://github.com/GoogleCloudPlatform/bdutil.git
+
+After you have downloaded the source, change into the newly created `bdutil` 
directory and continue with the next steps.
+
+# Deploying Flink on Google Compute Engine
+
+## Set up a bucket
+
+If you have not done so, create a bucket for the bdutil config and staging 
files. A new bucket can be created with gsutil:
+
+    gsutil mb gs://<bucket_name>
+
+## Adapt the bdutil config
+
+To deploy Flink with bdutil, adapt at least the following variables in
+bdutil_env.sh.
+
+    CONFIGBUCKET="<bucket_name>"
+    PROJECT="<compute_engine_project_name>"
+    NUM_WORKERS=<number_of_workers>
+
+    # set this to 'n1-standard-2' if you're using the free trial
+    GCE_MACHINE_TYPE="<gce_machine_type>"
+
+    # for example: "europe-west1-d"
+    GCE_ZONE="<gce_zone>"
+
+## Adapt the Flink config
+
+bdutil's Flink extension handles the configuration for you. You may 
additionally adjust configuration variables in `extensions/flink/flink_env.sh`. 
If you want to make further configuration, please take a look at [configuring 
Flink](../config.html). You will have to restart Flink after changing its 
configuration using `bin/stop-cluster` and `bin/start-cluster`.
+
+## Bring up a cluster with Flink
+
+To bring up the Flink cluster on Google Compute Engine, execute:
+
+    ./bdutil -e extensions/flink/flink_env.sh deploy
+
+## Run a Flink example job:
+
+    ./bdutil shell
+    cd /home/hadoop/flink-install/bin
+    ./flink run ../examples/batch/WordCount.jar 
gs://dataflow-samples/shakespeare/othello.txt gs://<bucket_name>/output
+
+## Shut down your cluster
+
+Shutting down a cluster is as simple as executing
+
+    ./bdutil -e extensions/flink/flink_env.sh delete

http://git-wip-us.apache.org/repos/asf/flink/blob/47070674/docs/ops/deployment/index.md
----------------------------------------------------------------------
diff --git a/docs/ops/deployment/index.md b/docs/ops/deployment/index.md
new file mode 100644
index 0000000..e82299d
--- /dev/null
+++ b/docs/ops/deployment/index.md
@@ -0,0 +1,24 @@
+---
+title: "Clusters & Deployment"
+nav-id: deployment
+nav-parent_id: ops
+nav-pos: 1
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->

http://git-wip-us.apache.org/repos/asf/flink/blob/47070674/docs/ops/deployment/kubernetes.md
----------------------------------------------------------------------
diff --git a/docs/ops/deployment/kubernetes.md 
b/docs/ops/deployment/kubernetes.md
new file mode 100644
index 0000000..0790a05
--- /dev/null
+++ b/docs/ops/deployment/kubernetes.md
@@ -0,0 +1,157 @@
+---
+title:  "Kubernetes Setup"
+nav-title: Kubernetes
+nav-parent_id: deployment
+nav-pos: 4
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+[Kubernetes](https://kubernetes.io) is a container orchestration system.
+
+* This will be replaced by the TOC
+{:toc}
+
+## Simple Kubernetes Flink Cluster
+
+A basic Flink cluster deployment in Kubernetes has three components:
+
+* a Deployment for a single Jobmanager
+* a Deployment for a pool of Taskmanagers
+* a Service exposing the Jobmanager's RPC and UI ports
+
+### Launching the cluster
+
+Using the [resource definitions found below](#simple-kubernetes-flink-cluster-
+resources), launch the cluster with the `kubectl` command:
+
+    kubectl create -f jobmanager-deployment.yaml
+    kubectl create -f taskmanager-deployment.yaml
+    kubectl create -f jobmanager-service.yaml
+
+You can then access the Flink UI via `kubectl proxy`:
+
+1. Run `kubectl proxy` in a terminal
+2. Navigate to 
[http://localhost:8001/api/v1/proxy/namespaces/default/services/flink-jobmanager:8081
+](http://localhost:8001/api/v1/proxy/namespaces/default/services/flink-
+jobmanager:8081) in your browser
+
+### Deleting the cluster
+
+Again, use `kubectl` to delete the cluster:
+
+    kubectl delete -f jobmanager-deployment.yaml
+    kubectl delete -f taskmanager-deployment.yaml
+    kubectl delete -f jobmanager-service.yaml
+
+## Advanced Cluster Deployment
+
+An early version of a [Flink Helm chart](https://github.com/docker-flink/
+examples) is available on GitHub.
+
+## Appendix
+
+### Simple Kubernetes Flink cluster resources
+
+`jobmanager-deployment.yaml`
+{% highlight yaml %}
+apiVersion: extensions/v1beta1
+kind: Deployment
+metadata:
+  name: flink-jobmanager
+spec:
+  replicas: 1
+  template:
+    metadata:
+      labels:
+        app: flink
+        component: jobmanager
+    spec:
+      containers:
+      - name: jobmanager
+        image: flink:latest
+        args:
+        - jobmanager
+        ports:
+        - containerPort: 6123
+          name: rpc
+        - containerPort: 6124
+          name: blob
+        - containerPort: 6125
+          name: query
+        - containerPort: 8081
+          name: ui
+        env:
+        - name: JOB_MANAGER_RPC_ADDRESS
+          value: flink-jobmanager
+{% endhighlight %}
+
+`taskmanager-deployment.yaml`
+{% highlight yaml %}
+apiVersion: extensions/v1beta1
+kind: Deployment
+metadata:
+  name: flink-taskmanager
+spec:
+  replicas: 2
+  template:
+    metadata:
+      labels:
+        app: flink
+        component: taskmanager
+    spec:
+      containers:
+      - name: taskmanager
+        image: flink:latest
+        args:
+        - taskmanager
+        ports:
+        - containerPort: 6121
+          name: data
+        - containerPort: 6122
+          name: rpc
+        - containerPort: 6125
+          name: query
+        env:
+        - name: JOB_MANAGER_RPC_ADDRESS
+          value: flink-jobmanager
+{% endhighlight %}
+
+`jobmanager-service.yaml`
+{% highlight yaml %}
+apiVersion: v1
+kind: Service
+metadata:
+  name: flink-jobmanager
+spec:
+  ports:
+  - name: rpc
+    port: 6123
+  - name: blob
+    port: 6124
+  - name: query
+    port: 6125
+  - name: ui
+    port: 8081
+  selector:
+    app: flink
+    component: jobmanager
+{% endhighlight %}
+
+{% top %}

http://git-wip-us.apache.org/repos/asf/flink/blob/47070674/docs/ops/deployment/mapr_setup.md
----------------------------------------------------------------------
diff --git a/docs/ops/deployment/mapr_setup.md 
b/docs/ops/deployment/mapr_setup.md
new file mode 100644
index 0000000..7575bdc
--- /dev/null
+++ b/docs/ops/deployment/mapr_setup.md
@@ -0,0 +1,132 @@
+---
+title:  "MapR Setup"
+nav-title: MapR
+nav-parent_id: deployment
+nav-pos: 7
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+This documentation provides instructions on how to prepare Flink for YARN
+executions on a [MapR](https://mapr.com/) cluster.
+
+* This will be replaced by the TOC
+{:toc}
+
+## Running Flink on YARN with MapR
+
+The instructions below assume MapR version 5.2.0. They will guide you
+to be able to start submitting [Flink on YARN]({{ site.baseurl 
}}/ops/deployment/yarn_setup.html)
+jobs or sessions to a MapR cluster.
+
+### Building Flink for MapR
+
+In order to run Flink on MapR, Flink needs to be built with MapR's own
+Hadoop and Zookeeper distribution. Simply build Flink using Maven with
+the following command from the project root directory:
+
+```
+mvn clean install -DskipTests -Pvendor-repos,mapr \
+    -Dhadoop.version=2.7.0-mapr-1607 \
+    -Dzookeeper.version=3.4.5-mapr-1604
+```
+
+The `vendor-repos` build profile adds MapR's repository to the build so that
+MapR's Hadoop / Zookeeper dependencies can be fetched. The `mapr` build
+profile additionally resolves some dependency clashes between MapR and
+Flink, as well as ensuring that the native MapR libraries on the cluster
+nodes are used. Both profiles must be activated.
+
+By default the `mapr` profile builds with Hadoop / Zookeeper dependencies
+for MapR version 5.2.0, so you don't need to explicitly override
+the `hadoop.version` and `zookeeper.version` properties.
+For different MapR versions, simply override these properties to appropriate
+values. The corresponding Hadoop / Zookeeper distributions for each MapR 
version
+can be found on MapR documentations such as
+[here](http://maprdocs.mapr.com/home/DevelopmentGuide/MavenArtifacts.html).
+
+### Job Submission Client Setup
+
+The client submitting Flink jobs to MapR also needs to be prepared with the 
below setups.
+
+Ensure that MapR's JAAS config file is picked up to avoid login failures:
+
+```
+export 
JVM_ARGS=-Djava.security.auth.login.config=/opt/mapr/conf/mapr.login.conf
+```
+
+Make sure that the `yarn.nodemanager.resource.cpu-vcores` property is set in 
`yarn-site.xml`:
+
+~~~xml
+<!-- in /opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop/yarn-site.xml -->
+
+<configuration>
+...
+
+<property>
+    <name>yarn.nodemanager.resource.cpu-vcores</name>
+    <value>...</value>
+</property>
+
+...
+</configuration>
+~~~
+
+Also remember to set the `YARN_CONF_DIR` or `HADOOP_CONF_DIR` environment
+variables to the path where `yarn-site.xml` is located:
+
+```
+export YARN_CONF_DIR=/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop/
+export HADOOP_CONF_DIR=/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop/
+```
+
+Make sure that the MapR native libraries are picked up in the classpath:
+
+```
+export FLINK_CLASSPATH=/opt/mapr/lib/*
+```
+
+If you'll be starting Flink on YARN sessions with `yarn-session.sh`, the
+below is also required:
+
+```
+export CC_CLASSPATH=/opt/mapr/lib/*
+```
+
+## Running Flink with a Secured MapR Cluster
+
+*Note: In Flink 1.2.0, Flink's Kerberos authentication for YARN execution has
+a bug that forbids it to work with MapR Security. Please upgrade to later Flink
+versions in order to use Flink with a secured MapR cluster. For more details,
+please see [FLINK-5949](https://issues.apache.org/jira/browse/FLINK-5949).*
+
+Flink's [Kerberos authentication]({{ site.baseurl 
}}/ops/security-kerberos.html) is independent of
+[MapR's Security 
authentication](http://maprdocs.mapr.com/home/SecurityGuide/Configuring-MapR-Security.html).
+With the above build procedures and environment variable setups, Flink
+does not require any additional configuration to work with MapR Security.
+
+Users simply need to login by using MapR's `maprlogin` authentication
+utility. Users that haven't acquired MapR login credentials would not be
+able to submit Flink jobs, erroring with:
+
+```
+java.lang.Exception: unable to establish the security context
+Caused by: o.a.f.r.security.modules.SecurityModule$SecurityInstallException: 
Unable to set the Hadoop login user
+Caused by: java.io.IOException: failure to login: Unable to obtain MapR 
credentials
+```

http://git-wip-us.apache.org/repos/asf/flink/blob/47070674/docs/ops/deployment/mesos.md
----------------------------------------------------------------------
diff --git a/docs/ops/deployment/mesos.md b/docs/ops/deployment/mesos.md
new file mode 100644
index 0000000..2fa340d
--- /dev/null
+++ b/docs/ops/deployment/mesos.md
@@ -0,0 +1,269 @@
+---
+title:  "Mesos Setup"
+nav-title: Mesos
+nav-parent_id: deployment
+nav-pos: 3
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+* This will be replaced by the TOC
+{:toc}
+
+## Background
+
+The Mesos implementation consists of two components: The Application Master and
+the Worker. The workers are simple TaskManagers which are parameterized by the 
environment
+set up by the application master. The most sophisticated component of the Mesos
+implementation is the application master. The application master currently 
hosts
+the following components:
+
+### Mesos Scheduler
+
+The scheduler is responsible for registering the framework with Mesos,
+requesting resources, and launching worker nodes. The scheduler continuously
+needs to report back to Mesos to ensure the framework is in a healthy state. To
+verify the health of the cluster, the scheduler monitors the spawned workers 
and
+marks them as failed and restarts them if necessary.
+
+Flink's Mesos scheduler itself is currently not highly available. However, it
+persists all necessary information about its state (e.g. configuration, list of
+workers) in Zookeeper. In the presence of a failure, it relies on an external
+system to bring up a new scheduler. The scheduler will then register with Mesos
+again and go through the reconciliation phase. In the reconciliation phase, the
+scheduler receives a list of running workers nodes. It matches these against 
the
+recovered information from Zookeeper and makes sure to bring back the cluster 
in
+the state before the failure.
+
+### Artifact Server
+
+The artifact server is responsible for providing resources to the worker
+nodes. The resources can be anything from the Flink binaries to shared secrets
+or configuration files. For instance, in non-containered environments, the
+artifact server will provide the Flink binaries. What files will be served
+depends on the configuration overlay used.
+
+### Flink's JobManager and Web Interface
+
+The Mesos scheduler currently resides with the JobManager but will be started
+independently of the JobManager in future versions (see
+[FLIP-6](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147077)).
 The
+proposed changes will also add a Dipsatcher component which will be the central
+point for job submission and monitoring.
+
+### Startup script and configuration overlays
+
+The startup script provide a way to configure and start the application
+master. All further configuration is then inherited by the workers nodes. This
+is achieved using configuration overlays. Configuration overlays provide a way
+to infer configuration from environment variables and config files which are
+shipped to the worker nodes.
+
+
+## DC/OS
+
+This section refers to [DC/OS](https://dcos.io) which is a Mesos distribution
+with a sophisticated application management layer. It comes pre-installed with
+Marathon, a service to supervise applications and maintain their state in case
+of failures.
+
+If you don't have a running DC/OS cluster, please follow the
+[instructions on how to install DC/OS on the official 
website](https://dcos.io/install/).
+
+Once you have a DC/OS cluster, you may install Flink through the DC/OS
+Universe. In the search prompt, just search for Flink. Alternatively, you can 
use the DC/OS CLI:
+
+    dcos package install flink
+
+Further information can be found in the
+[DC/OS examples 
documentation](https://github.com/dcos/examples/tree/master/1.8/flink).
+
+
+## Mesos without DC/OS
+
+You can also run Mesos without DC/OS.
+
+### Installing Mesos
+
+Please follow the [instructions on how to setup Mesos on the official 
website](http://mesos.apache.org/documentation/latest/getting-started/).
+
+After installation you have to configure the set of master and agent nodes by 
creating the files `MESOS_HOME/etc/mesos/masters` and 
`MESOS_HOME/etc/mesos/slaves`.
+These files contain in each row a single hostname on which the respective 
component will be started (assuming SSH access to these nodes).
+
+Next you have to create `MESOS_HOME/etc/mesos/mesos-master-env.sh` or use the 
template found in the same directory.
+In this file, you have to define
+
+    export MESOS_work_dir=WORK_DIRECTORY
+
+and it is recommended to uncommment
+
+    export MESOS_log_dir=LOGGING_DIRECTORY
+
+
+In order to configure the Mesos agents, you have to create 
`MESOS_HOME/etc/mesos/mesos-agent-env.sh` or use the template found in the same 
directory.
+You have to configure
+
+    export MESOS_master=MASTER_HOSTNAME:MASTER_PORT
+
+and uncomment
+
+    export MESOS_log_dir=LOGGING_DIRECTORY
+    export MESOS_work_dir=WORK_DIRECTORY
+
+#### Mesos Library
+
+In order to run Java applications with Mesos you have to export 
`MESOS_NATIVE_JAVA_LIBRARY=MESOS_HOME/lib/libmesos.so` on Linux.
+Under Mac OS X you have to export 
`MESOS_NATIVE_JAVA_LIBRARY=MESOS_HOME/lib/libmesos.dylib`.
+
+#### Deploying Mesos
+
+In order to start your mesos cluster, use the deployment script 
`MESOS_HOME/sbin/mesos-start-cluster.sh`.
+In order to stop your mesos cluster, use the deployment script 
`MESOS_HOME/sbin/mesos-stop-cluster.sh`.
+More information about the deployment scripts can be found 
[here](http://mesos.apache.org/documentation/latest/deploy-scripts/).
+
+### Installing Marathon
+
+Optionally, you may also [install 
Marathon](https://mesosphere.github.io/marathon/docs/) which will be necessary 
to run Flink in high availability (HA) mode.
+
+### Pre-installing Flink vs Docker/Mesos containers
+
+You may install Flink on all of your Mesos Master and Agent nodes.
+You can also pull the binaries from the Flink web site during deployment and 
apply your custom configuration before launching the application master.
+A more convenient and easier to maintain approach is to use Docker containers 
to manage the Flink binaries and configuration.
+
+This is controlled via the following configuration entries:
+
+    mesos.resourcemanager.tasks.container.type: mesos _or_ docker
+
+If set to 'docker', specify the image name:
+
+    mesos.resourcemanager.tasks.container.image.name: image_name
+
+
+### Standalone
+
+In the `/bin` directory of the Flink distribution, you find two startup scripts
+which manage the Flink processes in a Mesos cluster:
+
+1. `mesos-appmaster.sh`
+   This starts the Mesos application master which will register the Mesos 
scheduler.
+   It is also responsible for starting up the worker nodes.
+
+2. `mesos-taskmanager.sh`
+   The entry point for the Mesos worker processes.
+   You don't need to explicitly execute this script.
+   It is automatically launched by the Mesos worker node to bring up a new 
TaskManager.
+
+In order to run the `mesos-appmaster.sh` script you have to define 
`mesos.master` in the `flink-conf.yaml` or pass it via `-Dmesos.master=...` to 
the Java process.
+Additionally, you should define the number of task managers which are started 
by Mesos via `mesos.initial-tasks`.
+This value can also be defined in the `flink-conf.yaml` or passed as a Java 
property.
+
+When executing `mesos-appmaster.sh`, it will create a job manager on the 
machine where you executed the script.
+In contrast to that, the task managers will be run as Mesos tasks in the Mesos 
cluster.
+
+#### General configuration
+
+It is possible to completely parameterize a Mesos application through Java 
properties passed to the Mesos application master.
+This also allows to specify general Flink configuration parameters.
+For example:
+
+    bin/mesos-appmaster.sh \
+        -Dmesos.master=master.foobar.org:5050 \
+        -Djobmanager.heap.mb=1024 \
+        -Djobmanager.rpc.port=6123 \
+        -Djobmanager.web.port=8081 \
+        -Dmesos.initial-tasks=10 \
+        -Dmesos.resourcemanager.tasks.mem=4096 \
+        -Dtaskmanager.heap.mb=3500 \
+        -Dtaskmanager.numberOfTaskSlots=2 \
+        -Dparallelism.default=10
+
+
+### High Availability
+
+You will need to run a service like Marathon or Apache Aurora which takes care 
of restarting the Flink master process in case of node or process failures.
+In addition, Zookeeper needs to be configured like described in the [High 
Availability section of the Flink docs]({{ site.baseurl 
}}/ops/jobmanager_high_availability.html)
+
+For the reconciliation of tasks to work correctly, please also set 
`high-availability.zookeeper.path.mesos-workers` to a valid Zookeeper path.
+
+#### Marathon
+
+Marathon needs to be set up to launch the `bin/mesos-appmaster.sh` script.
+In particular, it should also adjust any configuration parameters for the 
Flink cluster.
+
+Here is an example configuration for Marathon:
+
+    {
+        "id": "flink",
+        "cmd": "$FLINK_HOME/bin/mesos-appmaster.sh -Djobmanager.heap.mb=1024 
-Djobmanager.rpc.port=6123 -Djobmanager.web.port=8081 -Dmesos.initial-tasks=1 
-Dmesos.resourcemanager.tasks.mem=1024 -Dtaskmanager.heap.mb=1024 
-Dtaskmanager.numberOfTaskSlots=2 -Dparallelism.default=2 
-Dmesos.resourcemanager.tasks.cpus=1",
+        "cpus": 1.0,
+        "mem": 1024
+    }
+
+When running Flink with Marathon, the whole Flink cluster including the job 
manager will be run as Mesos tasks in the Mesos cluster.
+
+### Configuration parameters
+
+`mesos.initial-tasks`: The initial workers to bring up when the master starts 
(**DEFAULT**: The number of workers specified at cluster startup).
+
+`mesos.constraints.hard.hostattribute`: Constraints for task placement on 
mesos based on agent attributes (**DEFAULT**: None).
+Takes a comma-separated list of key:value pairs corresponding to the 
attributes exposed by the target
+mesos agents.  Example: `az:eu-west-1a,series:t2`
+
+`mesos.maximum-failed-tasks`: The maximum number of failed workers before the 
cluster fails (**DEFAULT**: Number of initial workers).
+May be set to -1 to disable this feature.
+
+`mesos.master`: The Mesos master URL. The value should be in one of the 
following forms:
+
+* `host:port`
+* `zk://host1:port1,host2:port2,.../path`
+* `zk://username:password@host1:port1,host2:port2,.../path`
+* `file:///path/to/file`
+
+`mesos.failover-timeout`: The failover timeout in seconds for the Mesos 
scheduler, after which running tasks are automatically shut down (**DEFAULT:** 
600).
+
+`mesos.resourcemanager.artifactserver.port`:The config parameter defining the 
Mesos artifact server port to use. Setting the port to 0 will let the OS choose 
an available port.
+
+`mesos.resourcemanager.framework.name`: Mesos framework name (**DEFAULT:** 
Flink)
+
+`mesos.resourcemanager.framework.role`: Mesos framework role definition 
(**DEFAULT:** *)
+
+`high-availability.zookeeper.path.mesos-workers`: The ZooKeeper root path for 
persisting the Mesos worker information.
+
+`mesos.resourcemanager.framework.principal`: Mesos framework principal (**NO 
DEFAULT**)
+
+`mesos.resourcemanager.framework.secret`: Mesos framework secret (**NO 
DEFAULT**)
+
+`mesos.resourcemanager.framework.user`: Mesos framework user (**DEFAULT:**"")
+
+`mesos.resourcemanager.artifactserver.ssl.enabled`: Enables SSL for the Flink 
artifact server (**DEFAULT**: true). Note that `security.ssl.enabled` also 
needs to be set to `true` encryption to enable encryption.
+
+`mesos.resourcemanager.tasks.mem`: Memory to assign to the Mesos workers in MB 
(**DEFAULT**: 1024)
+
+`mesos.resourcemanager.tasks.cpus`: CPUs to assign to the Mesos workers 
(**DEFAULT**: 0.0)
+
+`mesos.resourcemanager.tasks.container.type`: Type of the containerization 
used: "mesos" or "docker" (DEFAULT: mesos);
+
+`mesos.resourcemanager.tasks.container.image.name`: Image name to use for the 
container (**NO DEFAULT**)
+
+`mesos.resourcemanager.tasks.container.volumes`: A comma seperated list of 
[host_path:]container_path[:RO|RW]. This allows for mounting additional volumes 
into your container. (**NO DEFAULT**)
+
+`mesos.resourcemanager.tasks.hostname`: Optional value to define the 
TaskManager's hostname. The pattern `_TASK_` is replaced by the actual id of 
the Mesos task. This can be used to configure the TaskManager to use Mesos DNS 
(e.g. `_TASK_.flink-service.mesos`) for name lookups. (**NO DEFAULT**)
+
+`mesos.resourcemanager.tasks.bootstrap-cmd`: A command which is executed 
before the TaskManager is started (**NO DEFAULT**).

http://git-wip-us.apache.org/repos/asf/flink/blob/47070674/docs/ops/deployment/yarn_setup.md
----------------------------------------------------------------------
diff --git a/docs/ops/deployment/yarn_setup.md 
b/docs/ops/deployment/yarn_setup.md
new file mode 100644
index 0000000..8c435f7
--- /dev/null
+++ b/docs/ops/deployment/yarn_setup.md
@@ -0,0 +1,338 @@
+---
+title:  "YARN Setup"
+nav-title: YARN
+nav-parent_id: deployment
+nav-pos: 2
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+* This will be replaced by the TOC
+{:toc}
+
+## Quickstart
+
+### Start a long-running Flink cluster on YARN
+
+Start a YARN session with 4 Task Managers (each with 4 GB of Heapspace):
+
+~~~bash
+# get the hadoop2 package from the Flink download page at
+# {{ site.download_url }}
+curl -O <flink_hadoop2_download_url>
+tar xvzf flink-{{ site.version }}-bin-hadoop2.tgz
+cd flink-{{ site.version }}/
+./bin/yarn-session.sh -n 4 -jm 1024 -tm 4096
+~~~
+
+Specify the `-s` flag for the number of processing slots per Task Manager. We 
recommend to set the number of slots to the number of processors per machine.
+
+Once the session has been started, you can submit jobs to the cluster using 
the `./bin/flink` tool.
+
+### Run a Flink job on YARN
+
+~~~bash
+# get the hadoop2 package from the Flink download page at
+# {{ site.download_url }}
+curl -O <flink_hadoop2_download_url>
+tar xvzf flink-{{ site.version }}-bin-hadoop2.tgz
+cd flink-{{ site.version }}/
+./bin/flink run -m yarn-cluster -yn 4 -yjm 1024 -ytm 4096 
./examples/batch/WordCount.jar
+~~~
+
+## Flink YARN Session
+
+Apache [Hadoop YARN](http://hadoop.apache.org/) is a cluster resource 
management framework. It allows to run various distributed applications on top 
of a cluster. Flink runs on YARN next to other applications. Users do not have 
to setup or install anything if there is already a YARN setup.
+
+**Requirements**
+
+- at least Apache Hadoop 2.2
+- HDFS (Hadoop Distributed File System) (or another distributed file system 
supported by Hadoop)
+
+If you have troubles using the Flink YARN client, have a look in the [FAQ 
section](http://flink.apache.org/faq.html#yarn-deployment).
+
+### Start Flink Session
+
+Follow these instructions to learn how to launch a Flink Session within your 
YARN cluster.
+
+A session will start all required Flink services (JobManager and TaskManagers) 
so that you can submit programs to the cluster. Note that you can run multiple 
programs per session.
+
+#### Download Flink
+
+Download a Flink package for Hadoop >= 2 from the [download page]({{ 
site.download_url }}). It contains the required files.
+
+Extract the package using:
+
+~~~bash
+tar xvzf flink-{{ site.version }}-bin-hadoop2.tgz
+cd flink-{{site.version }}/
+~~~
+
+#### Start a Session
+
+Use the following command to start a session
+
+~~~bash
+./bin/yarn-session.sh
+~~~
+
+This command will show you the following overview:
+
+~~~bash
+Usage:
+   Required
+     -n,--container <arg>   Number of YARN container to allocate (=Number of 
Task Managers)
+   Optional
+     -D <arg>                        Dynamic properties
+     -d,--detached                   Start detached
+     -jm,--jobManagerMemory <arg>    Memory for JobManager Container [in MB]
+     -nm,--name                      Set a custom name for the application on 
YARN
+     -q,--query                      Display available YARN resources (memory, 
cores)
+     -qu,--queue <arg>               Specify YARN queue.
+     -s,--slots <arg>                Number of slots per TaskManager
+     -tm,--taskManagerMemory <arg>   Memory per TaskManager Container [in MB]
+     -z,--zookeeperNamespace <arg>   Namespace to create the Zookeeper 
sub-paths for HA mode
+~~~
+
+Please note that the Client requires the `YARN_CONF_DIR` or `HADOOP_CONF_DIR` 
environment variable to be set to read the YARN and HDFS configuration.
+
+**Example:** Issue the following command to allocate 10 Task Managers, with 8 
GB of memory and 32 processing slots each:
+
+~~~bash
+./bin/yarn-session.sh -n 10 -tm 8192 -s 32
+~~~
+
+The system will use the configuration in `conf/flink-conf.yaml`. Please follow 
our [configuration guide]({{ site.baseurl }}/ops/config.html) if you want to 
change something.
+
+Flink on YARN will overwrite the following configuration parameters 
`jobmanager.rpc.address` (because the JobManager is always allocated at 
different machines), `taskmanager.tmp.dirs` (we are using the tmp directories 
given by YARN) and `parallelism.default` if the number of slots has been 
specified.
+
+If you don't want to change the configuration file to set configuration 
parameters, there is the option to pass dynamic properties via the `-D` flag. 
So you can pass parameters this way: `-Dfs.overwrite-files=true 
-Dtaskmanager.network.memory.min=536346624`.
+
+The example invocation starts 11 containers (even though only 10 containers 
were requested), since there is one additional container for the 
ApplicationMaster and Job Manager.
+
+Once Flink is deployed in your YARN cluster, it will show you the connection 
details of the Job Manager.
+
+Stop the YARN session by stopping the unix process (using CTRL+C) or by 
entering 'stop' into the client.
+
+Flink on YARN will only start all requested containers if enough resources are 
available on the cluster. Most YARN schedulers account for the requested memory 
of the containers,
+some account also for the number of vcores. By default, the number of vcores 
is equal to the processing slots (`-s`) argument. The `yarn.containers.vcores` 
allows overwriting the
+number of vcores with a custom value.
+
+#### Detached YARN Session
+
+If you do not want to keep the Flink YARN client running all the time, it's 
also possible to start a *detached* YARN session.
+The parameter for that is called `-d` or `--detached`.
+
+In that case, the Flink YARN client will only submit Flink to the cluster and 
then close itself.
+Note that in this case its not possible to stop the YARN session using Flink.
+
+Use the YARN utilities (`yarn application -kill <appId>`) to stop the YARN 
session.
+
+#### Attach to an existing Session
+
+Use the following command to start a session
+
+~~~bash
+./bin/yarn-session.sh
+~~~
+
+This command will show you the following overview:
+
+~~~bash
+Usage:
+   Required
+     -id,--applicationId <yarnAppId> YARN application Id
+~~~
+
+As already mentioned, `YARN_CONF_DIR` or `HADOOP_CONF_DIR` environment 
variable must be set to read the YARN and HDFS configuration.
+
+**Example:** Issue the following command to attach to running Flink YARN 
session `application_1463870264508_0029`:
+
+~~~bash
+./bin/yarn-session.sh -id application_1463870264508_0029
+~~~
+
+Attaching to a running session uses YARN ResourceManager to determine Job 
Manager RPC port.
+
+Stop the YARN session by stopping the unix process (using CTRL+C) or by 
entering 'stop' into the client.
+
+### Submit Job to Flink
+
+Use the following command to submit a Flink program to the YARN cluster:
+
+~~~bash
+./bin/flink
+~~~
+
+Please refer to the documentation of the [command-line client]({{ site.baseurl 
}}/ops/cli.html).
+
+The command will show you a help menu like this:
+
+~~~bash
+[...]
+Action "run" compiles and runs a program.
+
+  Syntax: run [OPTIONS] <jar-file> <arguments>
+  "run" action arguments:
+     -c,--class <classname>           Class with the program entry point 
("main"
+                                      method or "getPlan()" method. Only needed
+                                      if the JAR file does not specify the 
class
+                                      in its manifest.
+     -m,--jobmanager <host:port>      Address of the JobManager (master) to
+                                      which to connect. Use this flag to 
connect
+                                      to a different JobManager than the one
+                                      specified in the configuration.
+     -p,--parallelism <parallelism>   The parallelism with which to run the
+                                      program. Optional flag to override the
+                                      default value specified in the
+                                      configuration
+~~~
+
+Use the *run* action to submit a job to YARN. The client is able to determine 
the address of the JobManager. In the rare event of a problem, you can also 
pass the JobManager address using the `-m` argument. The JobManager address is 
visible in the YARN console.
+
+**Example**
+
+~~~bash
+wget -O LICENSE-2.0.txt http://www.apache.org/licenses/LICENSE-2.0.txt
+hadoop fs -copyFromLocal LICENSE-2.0.txt hdfs:/// ...
+./bin/flink run ./examples/batch/WordCount.jar \
+        hdfs:///..../LICENSE-2.0.txt hdfs:///.../wordcount-result.txt
+~~~
+
+If there is the following error, make sure that all TaskManagers started:
+
+~~~bash
+Exception in thread "main" org.apache.flink.compiler.CompilerException:
+    Available instances could not be determined from job manager: Connection 
timed out.
+~~~
+
+You can check the number of TaskManagers in the JobManager web interface. The 
address of this interface is printed in the YARN session console.
+
+If the TaskManagers do not show up after a minute, you should investigate the 
issue using the log files.
+
+
+## Run a single Flink job on YARN
+
+The documentation above describes how to start a Flink cluster within a Hadoop 
YARN environment. It is also possible to launch Flink within YARN only for 
executing a single job.
+
+Please note that the client then expects the `-yn` value to be set (number of 
TaskManagers).
+
+***Example:***
+
+~~~bash
+./bin/flink run -m yarn-cluster -yn 2 ./examples/batch/WordCount.jar
+~~~
+
+The command line options of the YARN session are also available with the 
`./bin/flink` tool. They are prefixed with a `y` or `yarn` (for the long 
argument options).
+
+Note: You can use a different configuration directory per job by setting the 
environment variable `FLINK_CONF_DIR`. To use this copy the `conf` directory 
from the Flink distribution and modify, for example, the logging settings on a 
per-job basis.
+
+Note: It is possible to combine `-m yarn-cluster` with a detached YARN 
submission (`-yd`) to "fire and forget" a Flink job to the YARN cluster. In 
this case, your application will not get any accumulator results or exceptions 
from the ExecutionEnvironment.execute() call!
+
+### User jars & Classpath
+
+By default Flink will include the user jars into the system classpath when 
running a single job. This behavior can be controlled with the 
`yarn.per-job-cluster.include-user-jar` parameter.
+
+When setting this to `DISABLED` Flink will include the jar in the user 
classpath instead.
+
+The user-jars position in the class path can be controlled by setting the 
parameter to one of the following:
+
+- `ORDER`: (default) Adds the jar to the system class path based on the 
lexicographic order.
+- `FIRST`: Adds the jar to the beginning of the system class path.
+- `LAST`: Adds the jar to the end of the system class path.
+
+## Recovery behavior of Flink on YARN
+
+Flink's YARN client has the following configuration parameters to control how 
to behave in case of container failures. These parameters can be set either 
from the `conf/flink-conf.yaml` or when starting the YARN session, using `-D` 
parameters.
+
+- `yarn.reallocate-failed`: This parameter controls whether Flink should 
reallocate failed TaskManager containers. Default: true
+- `yarn.maximum-failed-containers`: The maximum number of failed containers 
the ApplicationMaster accepts until it fails the YARN session. Default: The 
number of initially requested TaskManagers (`-n`).
+- `yarn.application-attempts`: The number of ApplicationMaster (+ its 
TaskManager containers) attempts. If this value is set to 1 (default), the 
entire YARN session will fail when the Application master fails. Higher values 
specify the number of restarts of the ApplicationMaster by YARN.
+
+## Debugging a failed YARN session
+
+There are many reasons why a Flink YARN session deployment can fail. A 
misconfigured Hadoop setup (HDFS permissions, YARN configuration), version 
incompatibilities (running Flink with vanilla Hadoop dependencies on Cloudera 
Hadoop) or other errors.
+
+### Log Files
+
+In cases where the Flink YARN session fails during the deployment itself, 
users have to rely on the logging capabilities of Hadoop YARN. The most useful 
feature for that is the [YARN log 
aggregation](http://hortonworks.com/blog/simplifying-user-logs-management-and-access-in-yarn/).
+To enable it, users have to set the `yarn.log-aggregation-enable` property to 
`true` in the `yarn-site.xml` file.
+Once that is enabled, users can use the following command to retrieve all log 
files of a (failed) YARN session.
+
+~~~
+yarn logs -applicationId <application ID>
+~~~
+
+Note that it takes a few seconds after the session has finished until the logs 
show up.
+
+### YARN Client console & Web interfaces
+
+The Flink YARN client also prints error messages in the terminal if errors 
occur during runtime (for example if a TaskManager stops working after some 
time).
+
+In addition to that, there is the YARN Resource Manager web interface (by 
default on port 8088). The port of the Resource Manager web interface is 
determined by the `yarn.resourcemanager.webapp.address` configuration value.
+
+It allows to access log files for running YARN applications and shows 
diagnostics for failed apps.
+
+## Build YARN client for a specific Hadoop version
+
+Users using Hadoop distributions from companies like Hortonworks, Cloudera or 
MapR might have to build Flink against their specific versions of Hadoop (HDFS) 
and YARN. Please read the [build instructions]({{ site.baseurl 
}}/start/building.html) for more details.
+
+## Running Flink on YARN behind Firewalls
+
+Some YARN clusters use firewalls for controlling the network traffic between 
the cluster and the rest of the network.
+In those setups, Flink jobs can only be submitted to a YARN session from 
within the cluster's network (behind the firewall).
+If this is not feasible for production use, Flink allows to configure a port 
range for all relevant services. With these
+ranges configured, users can also submit jobs to Flink crossing the firewall.
+
+Currently, two services are needed to submit a job:
+
+ * The JobManager (ApplicationMaster in YARN)
+ * The BlobServer running within the JobManager.
+
+When submitting a job to Flink, the BlobServer will distribute the jars with 
the user code to all worker nodes (TaskManagers).
+The JobManager receives the job itself and triggers the execution.
+
+The two configuration parameters for specifying the ports are the following:
+
+ * `yarn.application-master.port`
+ * `blob.server.port`
+
+These two configuration options accept single ports (for example: "50010"), 
ranges ("50000-50025"), or a combination of
+both ("50010,50011,50020-50025,50050-50075").
+
+(Hadoop is using a similar mechanism, there the configuration parameter is 
called `yarn.app.mapreduce.am.job.client.port-range`.)
+
+## Background / Internals
+
+This section briefly describes how Flink and YARN interact.
+
+<img src="{{ site.baseurl }}/fig/FlinkOnYarn.svg" class="img-responsive">
+
+The YARN client needs to access the Hadoop configuration to connect to the 
YARN resource manager and to HDFS. It determines the Hadoop configuration using 
the following strategy:
+
+* Test if `YARN_CONF_DIR`, `HADOOP_CONF_DIR` or `HADOOP_CONF_PATH` are set (in 
that order). If one of these variables are set, they are used to read the 
configuration.
+* If the above strategy fails (this should not be the case in a correct YARN 
setup), the client is using the `HADOOP_HOME` environment variable. If it is 
set, the client tries to access `$HADOOP_HOME/etc/hadoop` (Hadoop 2) and 
`$HADOOP_HOME/conf` (Hadoop 1).
+
+When starting a new Flink YARN session, the client first checks if the 
requested resources (containers and memory) are available. After that, it 
uploads a jar that contains Flink and the configuration to HDFS (step 1).
+
+The next step of the client is to request (step 2) a YARN container to start 
the *ApplicationMaster* (step 3). Since the client registered the configuration 
and jar-file as a resource for the container, the NodeManager of YARN running 
on that particular machine will take care of preparing the container (e.g. 
downloading the files). Once that has finished, the *ApplicationMaster* (AM) is 
started.
+
+The *JobManager* and AM are running in the same container. Once they 
successfully started, the AM knows the address of the JobManager (its own 
host). It is generating a new Flink configuration file for the TaskManagers (so 
that they can connect to the JobManager). The file is also uploaded to HDFS. 
Additionally, the *AM* container is also serving Flink's web interface. All 
ports the YARN code is allocating are *ephemeral ports*. This allows users to 
execute multiple Flink YARN sessions in parallel.
+
+After that, the AM starts allocating the containers for Flink's TaskManagers, 
which will download the jar file and the modified configuration from the HDFS. 
Once these steps are completed, Flink is set up and ready to accept Jobs.

http://git-wip-us.apache.org/repos/asf/flink/blob/47070674/docs/ops/index.md
----------------------------------------------------------------------
diff --git a/docs/ops/index.md b/docs/ops/index.md
new file mode 100644
index 0000000..a2e33ad
--- /dev/null
+++ b/docs/ops/index.md
@@ -0,0 +1,25 @@
+---
+title: "Deployment & Operations"
+nav-id: ops
+nav-title: '<i class="fa fa-sliders title maindish" aria-hidden="true"></i> 
Deployment & Operations'
+nav-parent_id: root
+nav-pos: 6
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->

http://git-wip-us.apache.org/repos/asf/flink/blob/47070674/docs/ops/jobmanager_high_availability.md
----------------------------------------------------------------------
diff --git a/docs/ops/jobmanager_high_availability.md 
b/docs/ops/jobmanager_high_availability.md
new file mode 100644
index 0000000..7dd7d4c
--- /dev/null
+++ b/docs/ops/jobmanager_high_availability.md
@@ -0,0 +1,239 @@
+---
+title: "JobManager High Availability (HA)"
+nav-title: High Availability (HA)
+nav-parent_id: ops
+nav-pos: 2
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+The JobManager coordinates every Flink deployment. It is responsible for both 
*scheduling* and *resource management*.
+
+By default, there is a single JobManager instance per Flink cluster. This 
creates a *single point of failure* (SPOF): if the JobManager crashes, no new 
programs can be submitted and running programs fail.
+
+With JobManager High Availability, you can recover from JobManager failures 
and thereby eliminate the *SPOF*. You can configure high availability for both 
**standalone** and **YARN clusters**.
+
+* Toc
+{:toc}
+
+## Standalone Cluster High Availability
+
+The general idea of JobManager high availability for standalone clusters is 
that there is a **single leading JobManager** at any time and **multiple 
standby JobManagers** to take over leadership in case the leader fails. This 
guarantees that there is **no single point of failure** and programs can make 
progress as soon as a standby JobManager has taken leadership. There is no 
explicit distinction between standby and master JobManager instances. Each 
JobManager can take the role of master or standby.
+
+As an example, consider the following setup with three JobManager instances:
+
+<img src="{{ site.baseurl }}/fig/jobmanager_ha_overview.png" class="center" />
+
+### Configuration
+
+To enable JobManager High Availability you have to set the **high-availability 
mode** to *zookeeper*, configure a **ZooKeeper quorum** and set up a **masters 
file** with all JobManagers hosts and their web UI ports.
+
+Flink leverages **[ZooKeeper](http://zookeeper.apache.org)** for *distributed 
coordination* between all running JobManager instances. ZooKeeper is a separate 
service from Flink, which provides highly reliable distributed coordination via 
leader election and light-weight consistent state storage. Check out 
[ZooKeeper's Getting Started 
Guide](http://zookeeper.apache.org/doc/trunk/zookeeperStarted.html) for more 
information about ZooKeeper. Flink includes scripts to [bootstrap a simple 
ZooKeeper](#bootstrap-zookeeper) installation.
+
+#### Masters File (masters)
+
+In order to start an HA-cluster configure the *masters* file in `conf/masters`:
+
+- **masters file**: The *masters file* contains all hosts, on which 
JobManagers are started, and the ports to which the web user interface binds.
+
+  <pre>
+jobManagerAddress1:webUIPort1
+[...]
+jobManagerAddressX:webUIPortX
+  </pre>
+
+By default, the job manager will pick a *random port* for inter process 
communication. You can change this via the 
**`high-availability.jobmanager.port`** key. This key accepts single ports 
(e.g. `50010`), ranges (`50000-50025`), or a combination of both 
(`50010,50011,50020-50025,50050-50075`).
+
+#### Config File (flink-conf.yaml)
+
+In order to start an HA-cluster add the following configuration keys to 
`conf/flink-conf.yaml`:
+
+- **high-availability mode** (required): The *high-availability mode* has to 
be set in `conf/flink-conf.yaml` to *zookeeper* in order to enable high 
availability mode.
+
+  <pre>high-availability: zookeeper</pre>
+
+- **ZooKeeper quorum** (required): A *ZooKeeper quorum* is a replicated group 
of ZooKeeper servers, which provide the distributed coordination service.
+
+  <pre>high-availability.zookeeper.quorum: 
address1:2181[,...],addressX:2181</pre>
+
+  Each *addressX:port* refers to a ZooKeeper server, which is reachable by 
Flink at the given address and port.
+
+- **ZooKeeper root** (recommended): The *root ZooKeeper node*, under which all 
cluster nodes are placed.
+
+  <pre>high-availability.zookeeper.path.root: /flink
+
+- **ZooKeeper cluster-id** (recommended): The *cluster-id ZooKeeper node*, 
under which all required coordination data for a cluster is placed.
+
+  <pre>high-availability.zookeeper.path.cluster-id: /default_ns # important: 
customize per cluster</pre>
+
+  **Important**: You should not set this value manually when runnig a YARN
+  cluster, a per-job YARN session, or on another cluster manager. In those
+  cases a cluster-id is automatically being generated based on the application
+  id. Manually setting a cluster-id overrides this behaviour in YARN.
+  Specifying a cluster-id with the -z CLI option, in turn, overrides manual
+  configuration. If you are running multiple Flink HA clusters on bare metal,
+  you have to manually configure separate cluster-ids for each cluster.
+
+- **Storage directory** (required): JobManager metadata is persisted in the 
file system *storageDir* and only a pointer to this state is stored in 
ZooKeeper.
+
+    <pre>
+high-availability.zookeeper.storageDir: hdfs:///flink/recovery
+    </pre>
+
+    The `storageDir` stores all metadata needed to recover a JobManager 
failure.
+
+After configuring the masters and the ZooKeeper quorum, you can use the 
provided cluster startup scripts as usual. They will start an HA-cluster. Keep 
in mind that the **ZooKeeper quorum has to be running** when you call the 
scripts and make sure to **configure a separate ZooKeeper root path** for each 
HA cluster you are starting.
+
+#### Example: Standalone Cluster with 2 JobManagers
+
+1. **Configure high availability mode and ZooKeeper quorum** in 
`conf/flink-conf.yaml`:
+
+   <pre>
+high-availability: zookeeper
+high-availability.zookeeper.quorum: localhost:2181
+high-availability.zookeeper.path.root: /flink
+high-availability.zookeeper.path.cluster-id: /cluster_one # important: 
customize per cluster
+high-availability.zookeeper.storageDir: hdfs:///flink/recovery</pre>
+
+2. **Configure masters** in `conf/masters`:
+
+   <pre>
+localhost:8081
+localhost:8082</pre>
+
+3. **Configure ZooKeeper server** in `conf/zoo.cfg` (currently it's only 
possible to run a single ZooKeeper server per machine):
+
+   <pre>server.0=localhost:2888:3888</pre>
+
+4. **Start ZooKeeper quorum**:
+
+   <pre>
+$ bin/start-zookeeper-quorum.sh
+Starting zookeeper daemon on host localhost.</pre>
+
+5. **Start an HA-cluster**:
+
+   <pre>
+$ bin/start-cluster.sh
+Starting HA cluster with 2 masters and 1 peers in ZooKeeper quorum.
+Starting jobmanager daemon on host localhost.
+Starting jobmanager daemon on host localhost.
+Starting taskmanager daemon on host localhost.</pre>
+
+6. **Stop ZooKeeper quorum and cluster**:
+
+   <pre>
+$ bin/stop-cluster.sh
+Stopping taskmanager daemon (pid: 7647) on localhost.
+Stopping jobmanager daemon (pid: 7495) on host localhost.
+Stopping jobmanager daemon (pid: 7349) on host localhost.
+$ bin/stop-zookeeper-quorum.sh
+Stopping zookeeper daemon (pid: 7101) on host localhost.</pre>
+
+## YARN Cluster High Availability
+
+When running a highly available YARN cluster, **we don't run multiple 
JobManager (ApplicationMaster) instances**, but only one, which is restarted by 
YARN on failures. The exact behaviour depends on on the specific YARN version 
you are using.
+
+### Configuration
+
+#### Maximum Application Master Attempts (yarn-site.xml)
+
+You have to configure the maximum number of attempts for the application 
masters for **your** YARN setup in `yarn-site.xml`:
+
+{% highlight xml %}
+<property>
+  <name>yarn.resourcemanager.am.max-attempts</name>
+  <value>4</value>
+  <description>
+    The maximum number of application master execution attempts.
+  </description>
+</property>
+{% endhighlight %}
+
+The default for current YARN versions is 2 (meaning a single JobManager 
failure is tolerated).
+
+#### Application Attempts (flink-conf.yaml)
+
+In addition to the HA configuration ([see above](#configuration)), you have to 
configure the maximum attempts in `conf/flink-conf.yaml`:
+
+<pre>yarn.application-attempts: 10</pre>
+
+This means that the application can be restarted 10 times before YARN fails 
the application. It's important to note that 
`yarn.resourcemanager.am.max-attempts` is an upper bound for the application 
restarts. Therfore, the number of application attempts set within Flink cannot 
exceed the YARN cluster setting with which YARN was started.
+
+#### Container Shutdown Behaviour
+
+- **YARN 2.3.0 < version < 2.4.0**. All containers are restarted if the 
application master fails.
+- **YARN 2.4.0 < version < 2.6.0**. TaskManager containers are kept alive 
across application master failures. This has the advantage that the startup 
time is faster and that the user does not have to wait for obtaining the 
container resources again.
+- **YARN 2.6.0 <= version**: Sets the attempt failure validity interval to the 
Flinks' Akka timeout value. The attempt failure validity interval says that an 
application is only killed after the system has seen the maximum number of 
application attempts during one interval. This avoids that a long lasting job 
will deplete it's application attempts.
+
+<p style="border-radius: 5px; padding: 5px" class="bg-danger"><b>Note</b>: 
Hadoop YARN 2.4.0 has a major bug (fixed in 2.5.0) preventing container 
restarts from a restarted Application Master/Job Manager container. See <a 
href="https://issues.apache.org/jira/browse/FLINK-4142";>FLINK-4142</a> for 
details. We recommend using at least Hadoop 2.5.0 for high availability setups 
on YARN.</p>
+
+#### Example: Highly Available YARN Session
+
+1. **Configure HA mode and ZooKeeper quorum** in `conf/flink-conf.yaml`:
+
+   <pre>
+high-availability: zookeeper
+high-availability.zookeeper.quorum: localhost:2181
+high-availability.zookeeper.storageDir: hdfs:///flink/recovery
+high-availability.zookeeper.path.root: /flink
+yarn.application-attempts: 10</pre>
+
+3. **Configure ZooKeeper server** in `conf/zoo.cfg` (currently it's only 
possible to run a single ZooKeeper server per machine):
+
+   <pre>server.0=localhost:2888:3888</pre>
+
+4. **Start ZooKeeper quorum**:
+
+   <pre>
+$ bin/start-zookeeper-quorum.sh
+Starting zookeeper daemon on host localhost.</pre>
+
+5. **Start an HA-cluster**:
+
+   <pre>
+$ bin/yarn-session.sh -n 2</pre>
+
+## Configuring for Zookeeper Security
+
+If ZooKeeper is running in secure mode with Kerberos, you can override the 
following configurations in `flink-conf.yaml` as necessary:
+
+<pre>
+zookeeper.sasl.service-name: zookeeper     # default is "zookeeper". If the 
ZooKeeper quorum is configured
+                                           # with a different service name 
then it can be supplied here.
+zookeeper.sasl.login-context-name: Client  # default is "Client". The value 
needs to match one of the values
+                                           # configured in 
"security.kerberos.login.contexts".
+</pre>
+
+For more information on Flink configuration for Kerberos security, please see 
[here]({{ site.baseurl}}/ops/config.html).
+You can also find [here]({{ site.baseurl}}/ops/security-kerberos.html) further 
details on how Flink internally setups Kerberos-based security.
+
+## Bootstrap ZooKeeper
+
+If you don't have a running ZooKeeper installation, you can use the helper 
scripts, which ship with Flink.
+
+There is a ZooKeeper configuration template in `conf/zoo.cfg`. You can 
configure the hosts to run ZooKeeper on with the `server.X` entries, where X is 
a unique ID of each server:
+
+<pre>
+server.X=addressX:peerPort:leaderPort
+[...]
+server.Y=addressY:peerPort:leaderPort
+</pre>
+
+The script `bin/start-zookeeper-quorum.sh` will start a ZooKeeper server on 
each of the configured hosts. The started processes start ZooKeeper servers via 
a Flink wrapper, which reads the configuration from `conf/zoo.cfg` and makes 
sure to set some required configuration values for convenience. In production 
setups, it is recommended to manage your own ZooKeeper installation.

http://git-wip-us.apache.org/repos/asf/flink/blob/47070674/docs/ops/production_ready.md
----------------------------------------------------------------------
diff --git a/docs/ops/production_ready.md b/docs/ops/production_ready.md
index 2cce8d0..c58ce5b 100644
--- a/docs/ops/production_ready.md
+++ b/docs/ops/production_ready.md
@@ -1,7 +1,7 @@
 ---
 title: "Production Readiness Checklist"
-nav-parent_id: setup
-nav-pos: 20
+nav-parent_id: ops
+nav-pos: 5
 ---
 <!--
 Licensed to the Apache Software Foundation (ASF) under one
@@ -64,7 +64,7 @@ parallelism as a function of the parallelism when the job is 
first started:
 
 ### Set UUIDs for operators
 
-As mentioned in the documentation for [savepoints]({{ site.baseurl 
}}/setup/savepoints.html), users should set uids for
+As mentioned in the documentation for [savepoints]({{ site.baseurl 
}}/ops/state/savepoints.html), users should set uids for
 operators. Those operator uids are important for Flink's mapping of operator 
states to operators which, in turn, is 
 essential for savepoints. By default operator uids are generated by traversing 
the JobGraph and hashing certain operator 
 properties. While this is comfortable from a user perspective, it is also very 
fragile, as changes to the JobGraph (e.g.

http://git-wip-us.apache.org/repos/asf/flink/blob/47070674/docs/ops/security-kerberos.md
----------------------------------------------------------------------
diff --git a/docs/ops/security-kerberos.md b/docs/ops/security-kerberos.md
index 3e5cad9..eac72f1 100644
--- a/docs/ops/security-kerberos.md
+++ b/docs/ops/security-kerberos.md
@@ -1,6 +1,6 @@
 ---
 title:  "Kerberos Authentication Setup and Configuration"
-nav-parent_id: setup
+nav-parent_id: ops
 nav-pos: 10
 nav-title: Kerberos
 ---
@@ -83,7 +83,7 @@ Here is some information specific to each deployment mode.
 
 Steps to run a secure Flink cluster in standalone/cluster mode:
 
-1. Add security-related configuration options to the Flink configuration file 
(on all cluster nodes) (see 
[here]({{site.baseurl}}/setup/config.html#kerberos-based-security)).
+1. Add security-related configuration options to the Flink configuration file 
(on all cluster nodes) (see [here](config.html#kerberos-based-security)).
 2. Ensure that the keytab file exists at the path indicated by 
`security.kerberos.login.keytab` on all cluster nodes.
 3. Deploy Flink cluster as normal.
 
@@ -91,7 +91,7 @@ Steps to run a secure Flink cluster in standalone/cluster 
mode:
 
 Steps to run a secure Flink cluster in YARN/Mesos mode:
 
-1. Add security-related configuration options to the Flink configuration file 
on the client (see 
[here]({{site.baseurl}}/setup/config.html#kerberos-based-security)).
+1. Add security-related configuration options to the Flink configuration file 
on the client (see [here](config.html#kerberos-based-security)).
 2. Ensure that the keytab file exists at the path as indicated by 
`security.kerberos.login.keytab` on the client node.
 3. Deploy Flink cluster as normal.
 
@@ -107,7 +107,7 @@ The main drawback is that the cluster is necessarily 
short-lived since the gener
 
 Steps to run a secure Flink cluster using `kinit`:
 
-1. Add security-related configuration options to the Flink configuration file 
on the client (see 
[here]({{site.baseurl}}/setup/config.html#kerberos-based-security)).
+1. Add security-related configuration options to the Flink configuration file 
on the client (see [here](config.html#kerberos-based-security)).
 2. Login using the `kinit` command.
 3. Deploy Flink cluster as normal.
 

http://git-wip-us.apache.org/repos/asf/flink/blob/47070674/docs/ops/security-ssl.md
----------------------------------------------------------------------
diff --git a/docs/ops/security-ssl.md b/docs/ops/security-ssl.md
new file mode 100644
index 0000000..7c7268a
--- /dev/null
+++ b/docs/ops/security-ssl.md
@@ -0,0 +1,144 @@
+---
+title: "SSL Setup"
+nav-parent_id: ops
+nav-pos: 10
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+This page provides instructions on how to enable SSL for the network 
communication between different flink components.
+
+## SSL Configuration
+
+SSL can be enabled for all network communication between flink components. SSL 
keystores and truststore has to be deployed on each flink node and configured 
(conf/flink-conf.yaml) using keys in the security.ssl.* namespace (Please see 
the [configuration page](config.html) for details). SSL can be selectively 
enabled/disabled for different transports using the following flags. These 
flags are only applicable when security.ssl.enabled is set to true.
+
+* **taskmanager.data.ssl.enabled**: SSL flag for data communication between 
task managers
+* **blob.service.ssl.enabled**: SSL flag for blob service client/server 
communication
+* **akka.ssl.enabled**: SSL flag for the akka based control connection between 
the flink client, jobmanager and taskmanager 
+* **jobmanager.web.ssl.enabled**: Flag to enable https access to the 
jobmanager's web frontend
+
+## Deploying Keystores and Truststores
+
+You need to have a Java Keystore generated and copied to each node in the 
flink cluster. The common name or subject alternative names in the certificate 
should match the node's hostname and IP address. Keystores and truststores can 
be generated using the keytool utility 
(https://docs.oracle.com/javase/8/docs/technotes/tools/unix/keytool.html). All 
flink components should have read access to the keystore and truststore files.
+
+### Example: Creating self signed CA and keystores for a 2 node cluster
+
+Execute the following keytool commands to create a truststore with a self 
signed CA
+
+~~~
+keytool -genkeypair -alias ca -keystore ca.keystore -dname "CN=Sample CA" 
-storepass password -keypass password -keyalg RSA -ext bc=ca:true
+keytool -keystore ca.keystore -storepass password -alias ca -exportcert > 
ca.cer
+keytool -importcert -keystore ca.truststore -alias ca -storepass password 
-noprompt -file ca.cer
+~~~
+
+Now create keystores for each node with certificates signed by the above CA. 
Let node1.company.org and node2.company.org be the hostnames with IPs 
192.168.1.1 and 192.168.1.2 respectively
+
+#### Node 1
+~~~
+keytool -genkeypair -alias node1 -keystore node1.keystore -dname 
"CN=node1.company.org" -ext SAN=dns:node1.company.org,ip:192.168.1.1 -storepass 
password -keypass password -keyalg RSA
+keytool -certreq -keystore node1.keystore -storepass password -alias node1 
-file node1.csr
+keytool -gencert -keystore ca.keystore -storepass password -alias ca -ext 
SAN=dns:node1.company.org,ip:192.168.1.1 -infile node1.csr -outfile node1.cer
+keytool -importcert -keystore node1.keystore -storepass password -file ca.cer 
-alias ca -noprompt
+keytool -importcert -keystore node1.keystore -storepass password -file 
node1.cer -alias node1 -noprompt
+~~~
+
+#### Node 2
+~~~
+keytool -genkeypair -alias node2 -keystore node2.keystore -dname 
"CN=node2.company.org" -ext SAN=dns:node2.company.org,ip:192.168.1.2 -storepass 
password -keypass password -keyalg RSA
+keytool -certreq -keystore node2.keystore -storepass password -alias node2 
-file node2.csr
+keytool -gencert -keystore ca.keystore -storepass password -alias ca -ext 
SAN=dns:node2.company.org,ip:192.168.1.2 -infile node2.csr -outfile node2.cer
+keytool -importcert -keystore node2.keystore -storepass password -file ca.cer 
-alias ca -noprompt
+keytool -importcert -keystore node2.keystore -storepass password -file 
node2.cer -alias node2 -noprompt
+~~~
+
+## Standalone Deployment
+Configure each node in the standalone cluster to pick up the keystore and 
truststore files present in the local file system.
+
+### Example: 2 node cluster
+
+* Generate 2 keystores, one for each node, and copy them to the filesystem on 
the respective node. Also copy the pulic key of the CA (which was used to sign 
the certificates in the keystore) as a Java truststore on both the nodes
+* Configure conf/flink-conf.yaml to pick up these files
+
+#### Node 1
+~~~
+security.ssl.enabled: true
+security.ssl.keystore: /usr/local/node1.keystore
+security.ssl.keystore-password: abc123
+security.ssl.key-password: abc123
+security.ssl.truststore: /usr/local/ca.truststore
+security.ssl.truststore-password: abc123
+~~~
+
+#### Node 2
+~~~
+security.ssl.enabled: true
+security.ssl.keystore: /usr/local/node2.keystore
+security.ssl.keystore-password: abc123
+security.ssl.key-password: abc123
+security.ssl.truststore: /usr/local/ca.truststore
+security.ssl.truststore-password: abc123
+~~~
+
+* Restart the flink components to enable SSL for all of flink's internal 
communication
+* Verify by accessing the jobmanager's UI using https url. The task manager's 
path in the UI should show akka.ssl.tcp:// as the protocol
+* The blob server and task manager's data communication can be verified from 
the log files
+
+## YARN Deployment
+The keystores and truststore can be deployed in a YARN setup in multiple ways 
depending on the cluster setup. Following are 2 ways to achieve this
+
+### 1. Deploy keystores before starting the YARN session
+The keystores and truststore should be generated and deployed on all nodes in 
the YARN setup where flink components can potentially be executed. The same 
flink config file from the flink YARN client is used for all the flink 
components running in the YARN cluster. Therefore we need to ensure the 
keystore is deployed and accessible using the same filepath in all the YARN 
nodes.
+
+#### Example config
+~~~
+security.ssl.enabled: true
+security.ssl.keystore: /usr/local/node.keystore
+security.ssl.keystore-password: abc123
+security.ssl.key-password: abc123
+security.ssl.truststore: /usr/local/ca.truststore
+security.ssl.truststore-password: abc123
+~~~
+
+Now you can start the YARN session from the CLI like you would normally do.
+
+### 2. Use YARN cli to deploy the keystores and truststore
+We can use the YARN client's ship files option (-yt) to distribute the 
keystores and truststore. Since the same keystore will be deployed at all 
nodes, we need to ensure a single certificate in the keystore can be served for 
all nodes. This can be done by either using the Subject Alternative Name(SAN) 
extension in the certificate and setting it to cover all nodes (hostname and ip 
addresses) in the cluster or by using wildcard subdomain names (if the cluster 
is setup accordingly). 
+
+#### Example
+* Supply the following parameters to the keytool command when generating the 
keystore: -ext 
SAN=dns:node1.company.org,ip:192.168.1.1,dns:node2.company.org,ip:192.168.1.2
+* Copy the keystore and the CA's truststore into a local directory (at the 
cli's working directory), say deploy-keys/
+* Update the configuration to pick up the files from a relative path
+
+~~~
+security.ssl.enabled: true
+security.ssl.keystore: deploy-keys/node.keystore
+security.ssl.keystore-password: password
+security.ssl.key-password: password
+security.ssl.truststore: deploy-keys/ca.truststore
+security.ssl.truststore-password: password
+~~~
+
+* Start the YARN session using the -yt parameter
+
+~~~
+flink run -m yarn-cluster -yt deploy-keys/ TestJob.jar
+~~~
+
+When deployed using YARN, flink's web dashboard is accessible through YARN 
proxy's Tracking URL. To ensure that the YARN proxy is able to access flink's 
https url you need to configure YARN proxy to accept flink's SSL certificates. 
Add the custom CA certificate into Java's default trustore on the YARN Proxy 
node.
+

http://git-wip-us.apache.org/repos/asf/flink/blob/47070674/docs/ops/state/checkpoints.md
----------------------------------------------------------------------
diff --git a/docs/ops/state/checkpoints.md b/docs/ops/state/checkpoints.md
new file mode 100644
index 0000000..4f2a9da
--- /dev/null
+++ b/docs/ops/state/checkpoints.md
@@ -0,0 +1,101 @@
+---
+title: "Checkpoints"
+nav-parent_id: ops_state
+nav-pos: 7
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+
+* toc
+{:toc}
+
+## Overview
+
+Checkpoints make state in Flink fault tolerant by allowing state and the
+corresponding stream positions to be recovered, thereby giving the application
+the same semantics as a failure-free execution.
+
+See [Checkpointing](../../dev/stream/state/checkpointing.html) for how to 
enable and
+configure checkpoints for your program.
+
+## Externalized Checkpoints
+
+Checkpoints are by default not persisted externally and are only used to
+resume a job from failures. They are deleted when a program is cancelled.
+You can, however, configure periodic checkpoints to be persisted externally
+similarly to [savepoints](savepoints.html). These *externalized checkpoints*
+write their meta data out to persistent storage and are *not* automatically
+cleaned up when the job fails. This way, you will have a checkpoint around
+to resume from if your job fails.
+
+```java
+CheckpointConfig config = env.getCheckpointConfig();
+config.enableExternalizedCheckpoints(ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);
+```
+
+The `ExternalizedCheckpointCleanup` mode configures what happens with 
externalized checkpoints when you cancel the job:
+
+- **`ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION`**: Retain the 
externalized checkpoint when the job is cancelled. Note that you have to 
manually clean up the checkpoint state after cancellation in this case.
+
+- **`ExternalizedCheckpointCleanup.DELETE_ON_CANCELLATION`**: Delete the 
externalized checkpoint when the job is cancelled. The checkpoint state will 
only be available if the job fails.
+
+### Directory Structure
+
+Similarly to [savepoints](savepoints.html), an externalized checkpoint consists
+of a meta data file and, depending on the state back-end, some additional data
+files. The **target directory** for the externalized checkpoint's meta data is
+determined from the configuration key `state.checkpoints.dir` which, currently,
+can only be set via the configuration files.
+
+```
+state.checkpoints.dir: hdfs:///checkpoints/
+```
+
+This directory will then contain the checkpoint meta data required to restore
+the checkpoint. For the `MemoryStateBackend`, this meta data file will be
+self-contained and no further files are needed.
+
+`FsStateBackend` and `RocksDBStateBackend` write separate data files
+and only write the paths to these files into the meta data file. These data
+files are stored at the path given to the state back-end during construction.
+
+```java
+env.setStateBackend(new RocksDBStateBackend("hdfs:///checkpoints-data/");
+```
+
+### Difference to Savepoints
+
+Externalized checkpoints have a few differences from 
[savepoints](savepoints.html). They
+- use a state backend specific (low-level) data format,
+- may be incremental,
+- do not support Flink specific features like rescaling.
+
+### Resuming from an externalized checkpoint
+
+A job may be resumed from an externalized checkpoint just as from a savepoint
+by using the checkpoint's meta data file instead (see the
+[savepoint restore guide](../cli.html#restore-a-savepoint)). Note that if the
+meta data file is not self-contained, the jobmanager needs to have access to
+the data files it refers to (see [Directory Structure](#directory-structure)
+above).
+
+```sh
+$ bin/flink run -s :checkpointMetaDataPath [:runArgs]
+```

http://git-wip-us.apache.org/repos/asf/flink/blob/47070674/docs/ops/state/index.md
----------------------------------------------------------------------
diff --git a/docs/ops/state/index.md b/docs/ops/state/index.md
new file mode 100644
index 0000000..8725f87
--- /dev/null
+++ b/docs/ops/state/index.md
@@ -0,0 +1,24 @@
+---
+nav-title: 'State & Fault Tolerance'
+nav-id: ops_state
+nav-parent_id: ops
+nav-pos: 3
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->

Reply via email to