con…

mauzhang Wed, 18 Jan 2017 23:22:29 -0800

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/5f90b70f/docs/contents/introduction/gearpump-internals.md
----------------------------------------------------------------------
diff --git a/docs/contents/introduction/gearpump-internals.md 
b/docs/contents/introduction/gearpump-internals.md
new file mode 100644
index 0000000..bc7e6bf
--- /dev/null
+++ b/docs/contents/introduction/gearpump-internals.md
@@ -0,0 +1,228 @@
+### Actor Hierarchy?
+
+![Actor Hierarchy](../img/actor_hierarchy.png)
+
+Everything in the diagram is an actor; they fall into two categories, Cluster 
Actors and Application Actors.
+
+#### Cluster Actors
+
+  **Worker**: Maps to a physical worker machine. It is responsible for 
managing resources and report metrics on that machine.
+
+  **Master**: Heart of the cluster, which manages workers, resources, and 
applications. The main function is delegated to three child actors, App 
Manager, Worker Manager, and Resource Scheduler.
+
+#### Application Actors:
+
+  **AppMaster**: Responsible to schedule the tasks to workers and manage the 
state of the application. Different applications have different AppMaster 
instances and are isolated.
+
+  **Executor**: Child of AppMaster, represents a JVM process. Its job is to 
manage the life cycle of tasks and recover the tasks in case of failure.
+
+  **Task**: Child of Executor, does the real job. Every task actor has a 
global unique address. One task actor can send data to any other task actors. 
This gives us great flexibility of how the computation DAG is distributed.
+
+  All actors in the graph are weaved together with actor supervision, and 
actor watching and every error is handled properly via supervisors. In a 
master, a risky job is isolated and delegated to child actors, so it's more 
robust. In the application, an extra intermediate layer "Executor" is created 
so that we can do fine-grained and fast recovery in case of task failure. A 
master watches the lifecycle of AppMaster and worker to handle the failures, 
but the life cycle of Worker and AppMaster are not bound to a Master Actor by 
supervision, so that Master node can fail independently.  Several Master Actors 
form an Akka cluster, the Master state is exchanged using the Gossip protocol 
in a conflict-free consistent way so that there is no single point of failure. 
With this hierarchy design, we are able to achieve high availability.
+
+### Application Clock and Global Clock Service
+
+Global clock service will track the minimum time stamp of all pending messages 
in the system. Every task will update its own minimum-clock to global clock 
service; the minimum-clock of task is decided by the minimum of:
+
+  - Minimum time stamp of all pending messages in the inbox.
+  - Minimum time stamp of all un-acked outgoing messages. When there is 
message loss, the minimum clock will not advance.
+  - Minimum clock of all task states. If the state is accumulated by a lot of 
input messages, then the clock value is decided by the oldest message's 
timestamp. The state clock will advance by doing snapshots to persistent 
storage or by fading out the effect of old messages.
+
+![Clock](../img/clock.png)
+
+The global clock service will keep track of all task minimum clocks 
effectively and maintain a global view of minimum clock. The global minimum 
clock value is monotonically increasing; it means that all source messages 
before this clock value have been processed. If there is message loss or task 
crash, the global minimum clock will stop.
+
+### How do we optimize the message passing performance?
+
+For streaming application, message passing performance is extremely important. 
For example, one streaming platform may need to process millions of messages 
per second with millisecond level latency. High throughput and low latency is 
not that easy to achieve. There are a number of challenges:
+
+#### First Challenge: Network is not efficient for small messages
+
+In streaming, typical message size is very small, usually less than 100 bytes 
per message, like the floating car GPS data. But network efficiency is very bad 
when transferring small messages. As you can see in below diagram, when message 
size is 50 bytes, it can only use 20% bandwidth. How to improve the throughput?
+
+![Throughput vs. Message Size](../img/through_vs_message_size.png)
+
+#### Second Challenge: Message overhead is too big
+
+For each message sent between two actors, it contains sender and receiver 
actor path. When sending over the wire, the overhead of this ActorPath is not 
trivial. For example, the below actor path takes more than 200 bytes.
+
+       :::bash
+       
akka.tcp://[email protected]:51582/remote/akka.tcp/[email protected]:48948/remote/akka.tcp/[email protected]:43676/user/master/Worker1/app_0_executor_0/group_1_task_0#-768886794
+
+
+#### How do we solve this?
+
+We implement a custom Netty transportation layer with Akka extension. In the 
below diagram, Netty Client will translate ActorPath to TaskId, and Netty 
Server will translate it back. Only TaskId will be passed on wire, it is only 
about 10 bytes, the overhead is minimized. Different Netty Client Actors are 
isolated; they will not block each other.
+
+![Netty Transport](../img/netty_transport.png)
+
+For performance, effective batching is really the key! We group multiple 
messages to a single batch and send it on the wire. The batch size is not 
fixed; it is adjusted dynamically based on network status. If the network is 
available, we will flush pending messages immediately without waiting; 
otherwise we will put the message in a batch and trigger a timer to flush the 
batch later.
+
+### How do we do flow Control?
+
+Without flow control, one task can easily flood another task with too many 
messages, causing out of memory error. Typical flow control will use a TCP-like 
sliding window, so that source and target can run concurrently without blocking 
each other.
+
+![Flow Control](../img/flow_control.png)
+Figure: Flow control, each task is "star" connected to input tasks and output 
tasks
+
+The difficult part for our problem is that each task can have multiple input 
tasks and output tasks. The input and output must be geared together so that 
the back pressure can be properly propagated from downstream to upstream. The 
flow control also needs to consider failures, and it needs to be able to 
recover when there is message loss.
+Another challenge is that the overhead of flow control messages can be big. If 
we ack every message, there will be huge amount of acked messages in the 
system, degrading streaming performance. The approach we adopted is to use 
explicit AckRequest message. The target tasks will only ack back when they 
receive the AckRequest message, and the source will only send AckRequest when 
it feels necessary. With this approach, we can largely reduce the overhead.
+
+### How do we detect message loss?
+
+For example, for web ads, we may charge for every click, we don't want to 
miscount.  The streaming platform needs to effectively track what messages have 
been lost, and recover as fast as possible.
+
+![Message Loss](../img/messageLoss.png)
+Figure: Message Loss Detection
+
+We use the flow control message AckRequest and Ack to detect message loss. The 
target task will count how many messages has been received since last 
AckRequest, and ack the count back to source task. The source task will check 
the count and find message loss.
+This is just an illustration, the real case is more difficulty, we need to 
handle zombie tasks, and in-the-fly stale messages.
+
+### How Gearpump know what messages to replay?
+
+In some applications, a message cannot be lost, and must be replayed. For 
example, during the money transfer, the bank will SMS us the verification code. 
If that message is lost, the system must replay it so that money transfer can 
continue. We made the decision to use **source end message storage** and **time 
stamp based replay**.
+
+![Replay](../img/replay.png)
+Figure: Replay with Source End Message Store
+
+Every message is immutable, and tagged with a timestamp. We have an assumption 
that the timestamp is approximately incremental (allow small ratio message 
disorder).
+
+We assume the message is coming from a replay-able source, like Kafka queue; 
otherwise the message will be stored at customizable source end "message 
store". When the source task sends the message downstream, the timestamp and 
offset of the message is also check-pointed to offset-timestamp storage 
periodically. During recovery, the system will first retrieve the right time 
stamp and offset from the offset-timestamp storage, then it will replay the 
message store from that time stamp and offset. A Timestamp Filter will filter 
out old messages in case the message in message store is not strictly 
time-ordered.
+
+### Master High Availability
+
+In a distributed streaming system, any part can fail. The system must stay 
responsive and do recovery in case of errors.
+
+![HA](../img/ha.png)
+Figure: Master High Availability
+
+We use Akka clustering to implement the Master high availability. The cluster 
consists of several master nodes, but no worker nodes. With clustering 
facilities, we can easily detect and handle the failure of master node crash. 
The master state is replicated on all master nodes with the Typesafe 
akka-data-replication  library, when one master node crashes, another standby 
master will read the master state and take over. The master state contains the 
submission data of all applications. If one application dies, a master can use 
that state to recover that application. CRDT LwwMap  is used to represent the 
state; it is a hash map that can converge on distributed nodes without 
conflict. To have strong data consistency, the state read and write must happen 
on a quorum of master nodes.
+
+### How we do handle failures?
+
+With Akka's powerful actor supervision, we can implement a resilient system 
relatively easy. In Gearpump, different applications have a different AppMaster 
instance, they are totally isolated from each other. For each application, 
there is a supervision tree, AppMaster->Executor->Task. With this supervision 
hierarchy, we can free ourselves from the headache of zombie process, for 
example if AppMaster is down, Akka supervisor will ensure the whole tree is 
shutting down.
+
+There are multiple possible failure scenarios
+
+![Failures](../img/failures.png)
+Figure: Possible Failure Scenarios and Error Supervision Hierarchy
+
+#### What happens when the Master crashes?
+
+In case of a master crash, other standby masters will be notified, they will 
resume the master state, and take over control. Worker and AppMaster will also 
be notified, They will trigger a process to find the new active master, until 
the resolution complete. If AppMaster or Worker cannot resolve a new Master in 
a time out, they will make suicide and kill themselves.
+
+#### What happens when a worker crashes?
+
+In case of a worker crash, the Master will get notified and stop scheduling 
new computation to this worker. All supervised executors on current worker will 
be killed, AppMaster can treat it as recovery of executor crash like [What 
happen when an executor crashes?](#what-happen-when-an-executor-crashes)
+
+#### What happens when the AppMaster crashes?
+
+If an AppMaster crashes, Master will schedule a new resource to create a new 
AppMaster Instance elsewhere, and then the AppMaster will handle the recovery 
inside the application. For streaming, it will recover the latest min clock and 
other state from disk, request resources from master to start executors, and 
restart the tasks with recovered min clock.
+
+#### What happen when an executor crashes?
+
+If an executor crashes, its supervisor AppMaster will get notified, and 
request a new resource from the active master to start a new executor, to run 
the tasks which were located on the crashed executor.
+
+#### What happen when tasks crash?
+
+If a task throws an exception, its supervisor executor will restart that Task.
+
+When "at least once" message delivery is enabled, it will trigger the message 
replaying in the case of message loss. First AppMaster will read the latest 
minimum clock from the global clock service(or clock storage if the clock 
service crashes), then AppMaster will restart all the task actors to get a 
fresh task state, then the source end tasks will replay messages from that 
minimum clock.
+
+### How does "exactly-once" message delivery work?
+
+For some applications, it is extremely important to do "exactly once" message 
delivery. For example, for a real-time billing system, we will not want to bill 
the customer twice. The goal of "exactly once" message delivery is to make sure:
+  The error doesn't accumulate, today's error will not be accumulated to 
tomorrow.
+  Transparent to application developer
+We use global clock to synchronize the distributed transactions. We assume 
every message from the data source will have a unique timestamp, the timestamp 
can be a part of the message body, or can be attached later with system clock 
when the message is injected into the streaming system. With this global 
synchronized clock, we can coordinate all tasks to checkpoint at same timestamp.
+
+![Checkpoint](../img/checkpointing.png)
+Figure: Checkpointing and Exactly-Once Message delivery
+
+Workflow to do state checkpointing:
+
+1. The coordinator asks the streaming system to do checkpoint at timestamp Tc.
+2. For each application task, it will maintain two states, checkpoint state 
and current state. Checkpoint state only contains information before timestamp 
Tc. Current state contains all information.
+3. When global minimum clock is larger than Tc, it means all messages older 
than Tc has been processed; the checkpoint state will no longer change, so we 
will then persist the checkpoint state to storage safely.
+4. When there is message loss, we will start the recovery process.
+5. To recover, load the latest checkpoint state from store, and then use it to 
restore the application status.
+6. Data source replays messages from the checkpoint timestamp.
+
+The checkpoint interval is determined by global clock service dynamically. 
Each data source will track the max timestamp of input messages. Upon receiving 
min clock updates, the data source will report the time delta back to global 
clock service. The max time delta is the upper bound of the application state 
timespan. The checkpoint interval is bigger than max delta time:
+
+![Checkpoint Equation](../img/checkpoint_equation.png)
+
+![Checkpointing Interval](../img/checkpointing_interval.png)
+Figure: How to determine Checkpoint Interval
+
+After the checkpoint interval is notified to tasks by global clock service, 
each task will calculate its next checkpoint timestamp autonomously without 
global synchronization.
+
+![Checkpoint Interval Equation](../img/checkpoint_interval_equation.png)
+
+For each task, it contains two states, checkpoint state and current state. The 
code to update the state is shown in listing below.
+
+       :::python
+       TaskState(stateStore, initialTimeStamp):
+         currentState = stateStore.load(initialTimeStamp)
+         checkpointState = currentState.clone
+         checkpointTimestamp = nextCheckpointTimeStamp(initialTimeStamp)
+       onMessage(msg):
+         if (msg.timestamp < checkpointTimestamp):
+           checkpointState.updateMessage(msg)
+         currentState.updateMessage(msg)  
+         maxClock = max(maxClock, msg.timeStamp)
+       
+       onMinClock(minClock):
+         if (minClock > checkpointTimestamp):
+           stateStore.persist(checkpointState)
+           checkpointTimeStamp = nextCheckpointTimeStamp(maxClock)
+           checkpointState = currentState.clone
+       
+       onNewCheckpointInterval(newStep):
+         step = newStep  
+       nextCheckpointTimeStamp(timestamp):
+         checkpointTimestamp = (1 + timestamp/step) * step
+       
+
+List 1: Task Transactional State Implementation
+
+### What is dynamic graph, and how it works?
+
+The DAG can be modified dynamically. We want to be able to dynamically add, 
remove, and replace a sub-graph.
+
+![Dynamic DAG](../img/dynamic.png)
+Figure: Dynamic Graph, Attach, Replace, and Remove
+
+## At least once message delivery and Kafka
+
+The Kafka source example project and tutorials can be found at:
+- [Kafka connector example 
project](https://github.com/apache/incubator-gearpump/tree/master/examples/streaming/kafka)
+- [Connect with Kafka source](../dev/dev-connectors)
+
+In this doc, we will talk about how the at least once message delivery works.
+
+We will use the WordCount example of [source 
tree](https://github.com/apache/incubator-gearpump/tree/master/examples/streaming/kafka)
  to illustrate.
+
+### How the kafka WordCount DAG looks like:
+
+It contains three processors:
+![Kafka WordCount](../img/kafka_wordcount.png)
+
+- KafkaStreamProducer(or KafkaSource) will read message from kafka queue.
+- Split will split lines to words
+- Sum will summarize the words to get a count for each word.
+
+### How to read data from Kafka
+
+We use KafkaSource, please check [Connect with Kafka 
source](../dev/dev-connectors) for the introduction.
+
+Please note that we have set a startTimestamp for the KafkaSource, which means 
KafkaSource will read from Kafka queue starting from messages whose timestamp 
is near startTimestamp.
+
+### What happen where there is Task crash or message loss?
+When there is message loss, the AppMaster will first pause the global clock 
service so that the global minimum timestamp no longer change, then it will 
restart the Kafka source tasks. Upon restart, Kafka Source will start to 
replay. It will first read the global minimum timestamp from AppMaster, and 
start to read message from that timestamp.
+
+### What method KafkaSource used to read messages from a start timestamp? As 
we know Kafka queue doesn't expose the timestamp information.
+
+Kafka queue only expose the offset information for each partition. What 
KafkaSource do is to maintain its own mapping from Kafka offset to  Application 
timestamp, so that we can map from a application timestamp to a Kafka offset, 
and replay Kafka messages from that Kafka offset.
+
+The mapping between Application timestamp with Kafka offset is stored in a 
distributed file system or as a Kafka topic.


http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/5f90b70f/docs/contents/introduction/message-delivery.md
----------------------------------------------------------------------
diff --git a/docs/contents/introduction/message-delivery.md 
b/docs/contents/introduction/message-delivery.md
new file mode 100644
index 0000000..1ffeb9e
--- /dev/null
+++ b/docs/contents/introduction/message-delivery.md
@@ -0,0 +1,47 @@
+## What is At Least Once Message Delivery?
+
+Messages could be lost on delivery due to network partitions. **At Least Once 
Message Delivery** (at least once) means the lost messages are delivered one or 
more times such that at least one is processed and acknowledged by the whole 
flow. 
+
+Gearpump guarantees at least once for any source that is able to replay 
message from a past timestamp. In Gearpump, each message is tagged with a 
timestamp, and the system tracks the minimum timestamp of all pending messages 
(the global minimum clock). On message loss, application will be restarted to 
the global minimum clock. Since the source is able to replay from the global 
minimum clock, all pending messages before the restart will be replayed. 
Gearpump calls that kind of source `TimeReplayableSource` and already provides 
a built in
+[KafkaSource](gearpump-internals#at-least-once-message-delivery-and-kafka). 
With the KafkaSource to ingest data into Gearpump, users are guaranteed at 
least once message delivery.
+
+## What is Exactly Once Message Delivery?
+
+At least once delivery doesn't guarantee the correctness of the application 
result. For instance,  for a task keeping the count of received messages, there 
could be overcount with duplicated messages and the count is lost on task 
failure.
+ In that case, **Exactly Once Message Delivery** (exactly once) is required, 
where state is updated by a message exactly once. This further requires that 
duplicated messages are filtered out and in-memory states are persisted.
+
+Users are guaranteed exactly once in Gearpump if they use both a 
`TimeReplayableSource` to ingest data and the Persistent API to manage their in 
memory states. With the Persistent API, user state is periodically checkpointed 
by the system to a persistent store (e.g HDFS) along with its checkpointed 
time. Gearpump tracks the global minimum checkpoint timestamp of all pending 
states (global minimum checkpoint clock), which is persisted as well. On 
application restart, the system restores states at the global minimum 
checkpoint clock and source replays messages from that clock. This ensures that 
a message updates all states exactly once.
+
+### Persistent API
+Persistent API consists of `PersistentTask` and `PersistentState`.
+
+Here is an example of using them to keep count of incoming messages.
+
+       :::scala
+       class CountProcessor(taskContext: TaskContext, conf: UserConfig)
+         extends PersistentTask[Long](taskContext, conf) {
+
+         override def persistentState: PersistentState[Long] = {
+        import com.twitter.algebird.Monoid.longMonoid
+        new NonWindowState[Long](new AlgebirdMonoid(longMonoid), new 
ChillSerializer[Long])
+      }
+
+      override def processMessage(state: PersistentState[Long], message: 
Message): Unit = {
+        state.update(message.timestamp, 1L)
+      }
+    }
+
+   
+The `CountProcessor` creates a customized `PersistentState` which will be 
managed by `PersistentTask` and overrides the `processMessage` method to define 
how the state is updated on a new message (each new message counts as `1`, 
which is added to the existing value)
+
+Gearpump has already offered two types of states
+ 
+1. NonWindowState - state with no time or other boundary
+2. WindowState - each state is bounded by a time window
+
+They are intended for states that satisfy monoid laws.
+
+1. has binary associative operation, like `+`  
+2. has an identity element, like `0`
+
+In the above example, we make use of the `longMonoid` from [Twitter's 
Algebird](https://github.com/twitter/algebird) library which provides a bunch 
of useful monoids. 
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/5f90b70f/docs/contents/introduction/performance-report.md
----------------------------------------------------------------------
diff --git a/docs/contents/introduction/performance-report.md 
b/docs/contents/introduction/performance-report.md
new file mode 100644
index 0000000..e4254d1
--- /dev/null
+++ b/docs/contents/introduction/performance-report.md
@@ -0,0 +1,34 @@
+# Performance Evaluation
+
+To illustrate the performance of Gearpump, we mainly focused on two aspects, 
throughput and latency, using a micro benchmark called SOL (an example in the 
Gearpump package) whose topology is quite simple. SOLStreamProducer delivers 
messages to SOLStreamProcessor constantly and SOLStreamProcessor does nothing. 
We set up a 4-nodes cluster with 10GbE network and each node's hardware is 
briefly shown as follows:
+
+Processor: 32 core Intel(R) Xeon(R) CPU E5-2690 2.90GHz
+Memory: 64GB
+
+## Throughput
+
+We tried to explore the upper bound of the throughput, after launching 48 
SOLStreamProducer and 48 SOLStreamProcessor the Figure below shows that the 
whole throughput of the cluster can reach about 18 million messages/second(100 
bytes per message)
+
+## Latency
+
+When we transfer message at the max throughput above, the average latency 
between two tasks is 8ms.
+
+## Fault Recovery time
+
+When the corruption is detected, for example the Executor is down, Gearpump 
will reallocate the resource and restart the application. It takes about 10 
seconds to recover the application.
+
+![Dashboard](../img/dashboard.png)
+
+## How to setup the benchmark environment?
+
+### Prepare the env
+
+1). Set up a 4-nodes Gearpump cluster with 10GbE network which have 4 Workers 
on each node. In our test environment, each node has 64GB memory and Intel(R) 
Xeon(R) 32-core processor E5-2690 2.90GHz. Make sure the metrics is enabled in 
Gearpump.
+
+2). Submit a SOL application with 48 StreamProducers and 48 StreamProcessors:
+
+    :::bash
+    bin/gear app -jar 
./examples/sol-{{SCALA_BINARY_VERSION}}-{{GEARPUMP_VERSION}}-assembly.jar 
-streamProducer 48 -streamProcessor 48
+
+
+3). Launch Gearpump's dashboard and browser http://$HOST:8090/, switch to the 
Applications tab and you can see the detail information of your application. 
The HOST should be the node runs dashboard.

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/5f90b70f/docs/contents/introduction/submit-your-1st-application.md
----------------------------------------------------------------------
diff --git a/docs/contents/introduction/submit-your-1st-application.md 
b/docs/contents/introduction/submit-your-1st-application.md
new file mode 100644
index 0000000..21cfaf2
--- /dev/null
+++ b/docs/contents/introduction/submit-your-1st-application.md
@@ -0,0 +1,39 @@
+Before you can submit and run your first Gearpump application, you will need a 
running Gearpump service.
+There are multiple ways to run Gearpump [Local 
mode](../deployment/deployment-local), [Standalone 
mode](../deployment/deployment-standalone), [YARN 
mode](../deployment/deployment-yarn) or [Docker 
mode](../deployment/deployment-docker).
+
+The easiest way is to run Gearpump in [Local 
mode](../deployment/deployment-local).
+Any Linux, MacOSX or Windows desktop can be used with zero configuration.
+
+In the example below, we assume your are running in [Local 
mode](../deployment/deployment-local).
+If you running Gearpump in one of the other modes, you will need to configure 
the Gearpump client to
+connect to the Gearpump service by setting the `gear.conf` configuration path 
in classpath.
+Within this file, you will need to change the parameter 
`gearpump.cluster.masters` to the correct Gearpump master(s).
+See [Configuration](../deployment/deployment-configuration) for details.
+
+## Steps to submit your first Application
+
+### Step 1: Submit application
+After the cluster is started, you can submit an example wordcount application 
to the cluster
+
+Open another shell,
+
+       :::bash
+       ### To run WordCount example
+       bin/gear app -jar 
examples/wordcount-{{SCALA_BINARY_VERSION}}-{{GEARPUMP_VERSION}}-assembly.jar 
org.apache.gearpump.streaming.examples.wordcount.WordCount
+
+
+###  Step 2: Congratulations, you've submitted your first application.
+
+To view the application status and metrics, start the Web UI services, and 
browse to [http://127.0.0.1:8090](http://127.0.0.1:8090) to check the status.
+The default username and password is "admin:admin", you can check
+[UI Authentication](../deployment/deployment-ui-authentication) to find how to 
manage users.
+
+![Dashboard](../img/dashboard.gif)
+
+**NOTE:** the UI port setting can be defined in configuration, please check 
section [Configuration](../deployment/deployment-configuration).
+
+## A quick Look at the Web UI
+TBD
+
+## Other Application Examples
+Besides wordcount, there are several other example applications. Please check 
the source tree examples/ for detail information.

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/5f90b70f/docs/docs/api/java.md
----------------------------------------------------------------------
diff --git a/docs/docs/api/java.md b/docs/docs/api/java.md
deleted file mode 100644
index 3b94f91..0000000
--- a/docs/docs/api/java.md
+++ /dev/null
@@ -1 +0,0 @@
-Placeholder

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/5f90b70f/docs/docs/api/scala.md
----------------------------------------------------------------------
diff --git a/docs/docs/api/scala.md b/docs/docs/api/scala.md
deleted file mode 100644
index 3b94f91..0000000
--- a/docs/docs/api/scala.md
+++ /dev/null
@@ -1 +0,0 @@
-Placeholder

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/5f90b70f/docs/docs/deployment/deployment-configuration.md
----------------------------------------------------------------------
diff --git a/docs/docs/deployment/deployment-configuration.md 
b/docs/docs/deployment/deployment-configuration.md
deleted file mode 100644
index 1dadbd7..0000000
--- a/docs/docs/deployment/deployment-configuration.md
+++ /dev/null
@@ -1,84 +0,0 @@
-## Master and Worker configuration
-
-Master and Worker daemons will only read configuration from `conf/gear.conf`.
-
-Master reads configuration from section master and gearpump:
-
-       :::bash
-       master {
-       }
-       gearpump{
-       }
-
-
-Worker reads configuration from section worker and gearpump:
-
-       :::bash
-       worker {
-       }
-       gearpump{
-       }
-       
-
-## Configuration for user submitted application job
-
-For user application job, it will read configuration file `gear.conf` and 
`application.conf` from classpath, while `application.conf` has higher priority.
-The default classpath contains:
-
-1. `conf/`
-2. current working directory.
-
-For example, you can put a `application.conf` on your working directory, and 
then it will be effective when you submit a new job application.
-
-## Logging
-
-To change the log level, you need to change both `gear.conf`, and 
`log4j.properties`.
-
-### To change the log level for master and worker daemon
-
-Please change `log4j.rootLevel` in `log4j.properties`, 
`gearpump-master.akka.loglevel` and `gearpump-worker.akka.loglevel` in 
`gear.conf`.
-
-### To change the log level for application job
-
-Please change `log4j.rootLevel` in `log4j.properties`, and `akka.loglevel` in 
`gear.conf` or `application.conf`.
-
-## Gearpump Default Configuration
-
-This is the default configuration for `gear.conf`.
-
-| config item    | default value  | description      |
-| -------------- | -------------- | ---------------- |
-| gearpump.hostname | "127.0.0.1" | hostname of current machine. If you are 
using local mode, then set this to 127.0.0.1. If you are using cluster mode, 
make sure this hostname can be accessed by other machines. |
-| gearpump.cluster.masters | ["127.0.0.1:3000"] | Config to set the master 
nodes of the cluster. If there are multiple master in the list, then the master 
nodes runs in HA mode. For example, you may start three master, on node1: 
`bin/master -ip node1 -port 3000`, on node2: `bin/master -ip node2 -port 3000`, 
on node3: `bin/master -ip node3 -port 3000`, then you need to set  
`gearpump.cluster.masters = ["node1:3000","node2:3000","node3:3000"]` |
-| gearpump.task-dispatcher | "gearpump.shared-thread-pool-dispatcher" | 
default dispatcher for task actor |
-| gearpump.metrics.enabled | true | flag to enable the metrics system |
-| gearpump.metrics.sample-rate | 1 | We will take one sample every 
`gearpump.metrics.sample-rate` data points. Note it may have impact that the 
statistics on UI portal is not accurate. Change it to 1 if you want accurate 
metrics in UI |
-| gearpump.metrics.report-interval-ms | 15000 | we will report once every 15 
seconds |
-| gearpump.metrics.reporter  | "akka" | available value: "graphite", "akka", 
"logfile" which write the metrics data to different places. |
-| gearpump.retainHistoryData.hours | 72 | max hours of history data to retain, 
Note: Due to implementation limitation(we store all history in memory), please 
don't set this to too big which may exhaust memory. |
-| gearpump.retainHistoryData.intervalMs | 3600000 |  time interval between two 
data points for history data (unit: ms). Usually this is set to a big value so 
that we only store coarse-grain data |
-| gearpump.retainRecentData.seconds | 300 | max seconds of recent data to 
retain. This is for the fine-grain data |
-| gearpump.retainRecentData.intervalMs | 15000 | time interval between two 
data points for recent data (unit: ms) |
-| gearpump.log.daemon.dir | "logs" | The log directory for daemon 
processes(relative to current working directory) |
-| gearpump.log.application.dir | "logs" | The log directory for 
applications(relative to current working directory) |
-| gearpump.serializers | a map | custom serializer for streaming application, 
e.g. `"scala.Array" = ""` |
-| gearpump.worker.slots | 1000 | How many slots each worker contains |
-| gearpump.appmaster.vmargs | "-server  -Xss1M -XX:+HeapDumpOnOutOfMemoryError 
-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80 -XX:+UseParNewGC 
-XX:NewRatio=3 -Djava.rmi.server.hostname=localhost" | JVM arguments for 
AppMaster |
-| gearpump.appmaster.extraClasspath | "" | JVM default class path for 
AppMaster |
-| gearpump.executor.vmargs | "-server -Xss1M -XX:+HeapDumpOnOutOfMemoryError 
-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80 -XX:+UseParNewGC 
-XX:NewRatio=3  -Djava.rmi.server.hostname=localhost" | JVM arguments for 
executor |
-| gearpump.executor.extraClasspath | "" | JVM default class path for executor |
-| gearpump.jarstore.rootpath | "jarstore/" |   Define where the submitted jar 
file will be stored. This path follows the hadoop path schema. For HDFS, use 
`hdfs://host:port/path/`, and HDFS HA, `hdfs://namespace/path/`; if you want to 
store on master nodes, then use local directory. `jarstore.rootpath = 
"jarstore/"` will point to relative directory where master is started. 
`jarstore.rootpath = "/jarstore/"` will point to absolute directory on master 
server |
-| gearpump.scheduling.scheduler-class 
|"org.apache.gearpump.cluster.scheduler.PriorityScheduler" | Class to schedule 
the applications. |
-| gearpump.services.host | "127.0.0.1" | dashboard UI host address |
-| gearpump.services.port | 8090 | dashboard UI host port |
-| gearpump.netty.buffer-size | 5242880 | netty connection buffer size |
-| gearpump.netty.max-retries | 30 | maximum number of retries for a netty 
client to connect to remote host |
-| gearpump.netty.base-sleep-ms | 100 | base sleep time for a netty client to 
retry a connection. Actual sleep time is a multiple of this value |
-| gearpump.netty.max-sleep-ms | 1000 | maximum sleep time for a netty client 
to retry a connection |
-| gearpump.netty.message-batch-size | 262144 | netty max batch size |
-| gearpump.netty.flush-check-interval | 10 | max flush interval for the netty 
layer, in milliseconds |
-| gearpump.netty.dispatcher | "gearpump.shared-thread-pool-dispatcher" | 
default dispatcher for netty client and server |
-| gearpump.shared-thread-pool-dispatcher | default Dispatcher with 
"fork-join-executor" | default shared thread pool dispatcher |
-| gearpump.single-thread-dispatcher | PinnedDispatcher | default single thread 
dispatcher |
-| gearpump.serialization-framework | 
"org.apache.gearpump.serializer.FastKryoSerializationFramework" | Gearpump has 
built-in serialization framework using Kryo. Users are allowed to use a 
different serialization framework, like Protobuf. See 
`org.apache.gearpump.serializer.FastKryoSerializationFramework` to find how a 
custom serialization framework can be defined |
-| worker.executor-share-same-jvm-as-worker | false | whether the executor 
actor is started in the same jvm(process) from which running the worker actor, 
the intention of this setting is for the convenience of single machine 
debugging, however, the app jar need to be added to the worker's classpath when 
you set it true and have a 'real' worker in the cluster |
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/5f90b70f/docs/docs/deployment/deployment-docker.md
----------------------------------------------------------------------
diff --git a/docs/docs/deployment/deployment-docker.md 
b/docs/docs/deployment/deployment-docker.md
deleted file mode 100644
index c71ed9d..0000000
--- a/docs/docs/deployment/deployment-docker.md
+++ /dev/null
@@ -1,5 +0,0 @@
-## Gearpump Docker Container
-
-There is pre-built docker container available at [Docker 
Repo](https://hub.docker.com/r/gearpump/gearpump/)
-
-Check the documents there to find how to launch a Gearpump cluster in one line.

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/5f90b70f/docs/docs/deployment/deployment-ha.md
----------------------------------------------------------------------
diff --git a/docs/docs/deployment/deployment-ha.md 
b/docs/docs/deployment/deployment-ha.md
deleted file mode 100644
index 9e907c0..0000000
--- a/docs/docs/deployment/deployment-ha.md
+++ /dev/null
@@ -1,75 +0,0 @@
-To support HA, we allow to start master on multiple nodes. They will form a 
quorum to decide consistency. For example, if we start master on 5 nodes and 2 
nodes are down, then the cluster is still consistent and functional.
-
-Here are the steps to enable the HA mode:
-
-### 1. Configure.
-
-#### Select master machines
-
-Distribute the package to all nodes. Modify `conf/gear.conf` on all nodes. You 
MUST configure
-
-       :::bash
-       gearpump.hostname
-
-to make it point to your hostname(or ip), and
-
-       :::bash
-       gearpump.cluster.masters
-       
-to a list of master nodes. For example, if I have 3 master nodes (node1, 
node2, and node3),  then the `gearpump.cluster.masters` can be set as
-
-       :::bash
-       gearpump.cluster {
-         masters = ["node1:3000", "node2:3000", "node3:3000"]
-       }
-       
-
-#### Configure distributed storage to store application jars.
-In `conf/gear.conf`, For entry `gearpump.jarstore.rootpath`, please choose the 
storage folder for application jars. You need to make sure this jar storage is 
highly available. We support two storage systems:
-
-  1). HDFS
-  
-  You need to configure the `gearpump.jarstore.rootpath` like this
-
-       :::bash
-    hdfs://host:port/path/
-
-
-  For HDFS HA,
-  
-       :::bash
-   hdfs://namespace/path/
-
-
-  2). Shared NFS folder
-  
-  First you need to map the NFS directory to local directory(same path) on all 
machines of master nodes.
-Then you need to set the `gearpump.jarstore.rootpath` like this:
-
-       :::bash
-       file:///your_nfs_mapping_directory
-
-
-  3). If you don't set this value, we will use the local directory of master 
node.
-  NOTE! There is no HA guarantee in this case, which means we are unable to 
recover running applications when master goes down.
-
-### 2. Start Daemon.
-
-On node1, node2, node3, Start Master
-
-       :::bash
-       ## on node1
-       bin/master -ip node1 -port 3000
-
-       ## on node2
-       bin/master -ip node2 -port 3000
-
-       ## on node3
-       bin/master -ip node3 -port 3000
-  
-
-### 3. Done!
-
-Now you have a highly available HA cluster. You can kill any node, the master 
HA will take effect.
-
-**NOTE**: It can take up to 15 seconds for master node to fail-over. You can 
change the fail-over timeout time by adding config in `gear.conf` 
`gearpump-master.akka.cluster.auto-down-unreachable-after=10s` or set it to a 
smaller value

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/5f90b70f/docs/docs/deployment/deployment-local.md
----------------------------------------------------------------------
diff --git a/docs/docs/deployment/deployment-local.md 
b/docs/docs/deployment/deployment-local.md
deleted file mode 100644
index 81e6029..0000000
--- a/docs/docs/deployment/deployment-local.md
+++ /dev/null
@@ -1,34 +0,0 @@
-You can start the Gearpump service in a single JVM(local mode), or in a 
distributed cluster(cluster mode). To start the cluster in local mode, you can 
use the local /local.bat helper scripts, it is very useful for developing or 
troubleshooting.
-
-Below are the steps to start a Gearpump service in **Local** mode:
-
-### Step 1: Get your Gearpump binary ready
-To get your Gearpump service running in local mode, you first need to have a 
Gearpump distribution binary ready.
-Please follow [this guide](get-gearpump-distribution) to have the binary.  
-
-### Step 2: Start the cluster
-You can start a local mode cluster in single line
-
-       :::bash
-       ## start the master and 2 workers in single JVM. The master will listen 
on 3000
-       ## you can Ctrl+C to kill the local cluster after you finished the 
startup tutorial.
-       bin/local
-       
-
-**NOTE:** You may need to execute `chmod +x bin/*` in shell to make the script 
file `local` executable.
-
-**NOTE:** You can change the default port by changing config 
`gearpump.cluster.masters` in `conf/gear.conf`.
-
-**NOTE: Change the working directory**. Log files by default will be generated 
under current working directory. So, please "cd" to required working directly 
before running the shell commands.
-
-**NOTE: Run as Daemon**. You can run it as a background process. For example, 
use [nohup](http://linux.die.net/man/1/nohup) on Linux.
-
-### Step 3: Start the Web UI server
-Open another shell,
-
-       :::bash
-       bin/services
-       
-You can manage the applications in UI 
[http://127.0.0.1:8090](http://127.0.0.1:8090) or by [Command Line 
tool](../introduction/commandline).
-The default username and password is "admin:admin", you can check
-[UI Authentication](../deployment/deployment-ui-authentication) to find how to 
manage users.

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/5f90b70f/docs/docs/deployment/deployment-msg-delivery.md
----------------------------------------------------------------------
diff --git a/docs/docs/deployment/deployment-msg-delivery.md 
b/docs/docs/deployment/deployment-msg-delivery.md
deleted file mode 100644
index 53e8c2e..0000000
--- a/docs/docs/deployment/deployment-msg-delivery.md
+++ /dev/null
@@ -1,60 +0,0 @@
-## How to deploy for At Least Once Message Delivery?
-
-As introduced in the [What is At Least Once Message 
Delivery](../introduction/message-delivery#what-is-at-least-once-message-delivery),
 Gearpump has a built in KafkaSource. To get at least once message delivery, 
users should deploy a Kafka cluster as the offset store along with the Gearpump 
cluster. 
-
-Here's an example to deploy a local Kafka cluster. 
-
-1. download the latest Kafka from the official website and extract to a local 
directory (`$KAFKA_HOME`)
-
-2. Boot up the single-node Zookeeper instance packaged with Kafka. 
-
-       :::bash
-       $KAFKA_HOME/bin/zookeeper-server-start.sh 
$KAFKA_HOME/config/zookeeper.properties
-    
- 
-3. Start a Kafka broker
-
-           :::bash
-           $KAFKA_HOME/bin/kafka-server-start.sh 
$KAFKA_HOME/config/kafka.properties
-             
-
-4. When creating a offset store for `KafkaSource`, set the zookeeper connect 
string to `localhost:2181` and broker list to `localhost:9092` in 
`KafkaStorageFactory`.
-
-           :::scala
-           val offsetStorageFactory = new 
KafkaStorageFactory("localhost:2181", "localhost:9092")
-           val source = new KafkaSource("topic1", "localhost:2181", 
offsetStorageFactory)
-           
-
-## How to deploy for Exactly Once Message Delivery?
-
-Exactly Once Message Delivery requires both an offset store and a checkpoint 
store. For the offset store, a Kafka cluster should be deployed as in the 
previous section. As for the checkpoint store, Gearpump has built-in support 
for Hadoop file systems, like HDFS. Hence, users should deploy a HDFS cluster 
alongside the Gearpump cluster. 
-
-Here's an example to deploy a local HDFS cluster.
-
-1. download Hadoop 2.6 from the official website and extracts it to a local 
directory `HADOOP_HOME`
-
-2. add following configuration to `$HADOOP_HOME/etc/core-site.xml`
-
-           :::xml
-           <configuration>
-             <property>
-               <name>fs.defaultFS</name>
-               <value>hdfs://localhost:9000</value>
-             </property>
-           </configuration>
-           
-
-3. start HDFS
-
-           :::bash
-           $HADOOP_HOME/sbin/start-dfs.sh
-           
-   
-4. When creating a `HadoopCheckpointStore`, set the hadoop configuration as in 
the `core-site.xml`
-
-               :::scala   
-       val hadoopConfig = new Configuration
-       hadoopConfig.set("fs.defaultFS", "hdfs://localhost:9000")
-       val checkpointStoreFactory = new 
HadoopCheckpointStoreFactory("MessageCount", hadoopConfig, new 
FileSizeRotation(1000))
-
-    
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/5f90b70f/docs/docs/deployment/deployment-resource-isolation.md
----------------------------------------------------------------------
diff --git a/docs/docs/deployment/deployment-resource-isolation.md 
b/docs/docs/deployment/deployment-resource-isolation.md
deleted file mode 100644
index ee47802..0000000
--- a/docs/docs/deployment/deployment-resource-isolation.md
+++ /dev/null
@@ -1,112 +0,0 @@
-CGroup (abbreviated from control groups) is a Linux kernel feature to limit, 
account, and isolate resource usage (CPU, memory, disk I/O, etc.) of process 
groups.In Gearpump, we use cgroup to manage CPU resources.
-
-## Start CGroup Service 
-
-CGroup feature is only supported by Linux whose kernel version is larger than 
2.6.18. Please also make sure the SELinux is disabled before start CGroup.
-
-The following steps are supposed to be executed by root user.
-
-1. Check `/etc/cgconfig.conf` exist or not. If not exists, please `yum install 
libcgroup`.
-
-2. Run following command to see whether the **cpu** subsystem is already 
mounted to the file system.
- 
-               :::bash
-               lssubsys -m
-               
-    Each subsystem in CGroup will have a corresponding mount file path in 
local file system. For example, the following output shows that **cpu** 
subsystem is mounted to file path `/sys/fs/cgroup/cpu`
-   
-               :::bash
-               cpu /sys/fs/cgroup/cpu
-               net_cls /sys/fs/cgroup/net_cls
-               blkio /sys/fs/cgroup/blkio
-               perf_event /sys/fs/cgroup/perf_event
-  
-   
-3. If you want to assign permission to user **gear** to launch Gearpump Worker 
and applications with resource isolation enabled, you need to check gear's uid 
and gid in `/etc/passwd` file, let's take **500** for example.
-
-4. Add following content to `/etc/cgconfig.conf`
-    
-               
-               # The mount point of cpu subsystem.
-               # If your system already mounted it, this segment should be 
eliminated.
-               mount {    
-                 cpu = /cgroup/cpu;
-               }
-                   
-               # Here the group name "gearpump" represents a node in CGroup's 
hierarchy tree.
-               # When the CGroup service is started, there will be a folder 
generated under the mount point of cpu subsystem,
-               # whose name is "gearpump".
-                   
-               group gearpump {
-                  perm {
-                      task {
-                          uid = 500;
-                          gid = 500;
-                       }
-                      admin {
-                          uid = 500;
-                          gid = 500;
-                      }
-                  }
-                  cpu {
-                  }
-               }
-          
-   
-   Please note that if the output of step 2 shows that **cpu** subsystem is 
already mounted, then the `mount` segment should not be included.
-   
-4. Then Start cgroup service
-   
-               :::bash
-               sudo service cgconfig restart 
-   
-   
-5. There should be a folder **gearpump** generated under the mount point of 
cpu subsystem and its owner is **gear:gear**.  
-  
-6. Repeat the above-mentioned steps on each machine where you want to launch 
Gearpump.   
-
-## Enable Cgroups in Gearpump 
-1. Login into the machine which has CGroup prepared with user **gear**.
-
-               :::bash
-               ssh gear@node
-   
-
-2. Enter into Gearpump's home folder, edit gear.conf under folder 
`${GEARPUMP_HOME}/conf/`
-
-               :::bash
-               gearpump.worker.executor-process-launcher = 
"org.apache.gearpump.cluster.worker.CGroupProcessLauncher"
-   
-               gearpump.cgroup.root = "gearpump"
-   
-
-   Please note the gearpump.cgroup.root **gearpump** must be consistent with 
the group name in /etc/cgconfig.conf.
-
-3. Repeat the above-mentioned steps on each machine where you want to launch 
Gearpump
-
-4. Start the Gearpump cluster, please refer to [Deploy Gearpump in Standalone 
Mode](deployment-standalone)
-
-## Launch Application From Command Line
-1. Login into the machine which has Gearpump distribution.
-
-2. Enter into Gearpump's home folder, edit gear.conf under folder 
`${GEARPUMP_HOME}/conf/`
-   
-               :::bash
-               gearpump.cgroup.cpu-core-limit-per-executor = 
${your_preferred_int_num}
-   
-  
-   Here the configuration is the number of CPU cores per executor can use and 
-1 means no limitation
-
-3. Submit application
-
-               :::bash
-               bin/gear app -jar 
examples/sol-{{SCALA_BINARY_VERSION}}-{{GEARPUMP_VERSION}}-assembly.jar 
-streamProducer 10 -streamProcessor 10 
-   
-
-4. Then you can run command `top` to monitor the cpu usage.
-
-## Launch Application From Dashboard
-If you want to submit the application from dashboard, by default the 
`gearpump.cgroup.cpu-core-limit-per-executor` is inherited from Worker's 
configuration. You can provide your own conf file to override it.
-
-## Limitations
-Windows and Mac OS X don't support CGroup, so the resource isolation will not 
work even if you turn it on. There will not be any limitation for single 
executor's cpu usage.

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/5f90b70f/docs/docs/deployment/deployment-security.md
----------------------------------------------------------------------
diff --git a/docs/docs/deployment/deployment-security.md 
b/docs/docs/deployment/deployment-security.md
deleted file mode 100644
index e20fc67..0000000
--- a/docs/docs/deployment/deployment-security.md
+++ /dev/null
@@ -1,80 +0,0 @@
-Until now Gearpump supports deployment in a secured Yarn cluster and writing 
to secured HBase, where "secured" means Kerberos enabled. 
-Further security related feature is in progress.
-
-## How to launch Gearpump in a secured Yarn cluster
-Suppose user `gear` will launch gearpump on YARN, then the corresponding 
principal `gear` should be created in KDC server.
-
-1. Create Kerberos principal for user `gear`, on the KDC machine
- 
-               :::bash 
-               sudo kadmin.local
-   
-       In the kadmin.local or kadmin shell, create the principal
-   
-               :::bash
-               kadmin:  addprinc 
gear/[email protected]
-   
-       Remember that user `gear` must exist on every node of Yarn. 
-
-2. Upload the gearpump-{{SCALA_BINARY_VERSION}}-{{GEARPUMP_VERSION}}.zip to 
remote HDFS Folder, suggest to put it under 
`/usr/lib/gearpump/gearpump-{{SCALA_BINARY_VERSION}}-{{GEARPUMP_VERSION}}.zip`
-
-3. Create HDFS folder /user/gear/, make sure all read-write rights are granted 
for user `gear`
-
-               :::bash
-               drwxr-xr-x - gear gear 0 2015-11-27 14:03 /user/gear
-   
-   
-4. Put the YARN configurations under classpath.
-  Before calling `yarnclient launch`, make sure you have put all yarn 
configuration files under classpath. Typically, you can just copy all files 
under `$HADOOP_HOME/etc/hadoop` from one of the YARN cluster machine to 
`conf/yarnconf` of gearpump. `$HADOOP_HOME` points to the Hadoop installation 
directory. 
-  
-5. Get Kerberos credentials to submit the job:
-
-               :::bash
-               kinit gearpump/[email protected]
-   
-   
-       Here you can login with keytab or password. Please refer Kerberos's 
document for details.
-    
-               :::bash
-               yarnclient launch -package 
/usr/lib/gearpump/gearpump-{{SCALA_BINARY_VERSION}}-{{GEARPUMP_VERSION}}.zip
-   
-  
-## How to write to secured HBase
-When the remote HBase is security enabled, a kerberos keytab and the 
corresponding principal name need to be
-provided for the gearpump-hbase connector. Specifically, the `UserConfig` 
object passed into the HBaseSink should contain
-`{("gearpump.keytab.file", "\\$keytab"), ("gearpump.kerberos.principal", 
"\\$principal")}`. example code of writing to secured HBase:
-
-       :::scala
-       val principal = "gearpump/[email protected]"
-       val keytabContent = Files.toByteArray(new File("path_to_keytab_file"))
-       val appConfig = UserConfig.empty
-             .withString("gearpump.kerberos.principal", principal)
-             .withBytes("gearpump.keytab.file", keytabContent)
-       val sink = new HBaseSink(appConfig, "$tableName")
-       val sinkProcessor = DataSinkProcessor(sink, "$sinkNum")
-       val split = Processor[Split]("$splitNum")
-       val computation = split ~> sinkProcessor
-       val application = StreamApplication("HBase", Graph(computation), 
UserConfig.empty)
-
-
-Note here the keytab file set into config should be a byte array.
-
-## Future Plan
-
-### More external components support
-1. HDFS
-2. Kafka
-
-### Authentication(Kerberos)
-Since Gearpumpâs Master-Worker structure is similar to HDFSâs 
NameNode-DataNode and Yarnâs ResourceManager-NodeManager, we may follow the 
way they use.
-
-1. User creates kerberos principal and keytab for Gearpump.
-2. Deploy the keytab files to all the cluster nodes.
-3. Configure Gearpumpâs conf file, specify kerberos principal and local 
keytab file location.
-4. Start Master and Worker.
-
-Every application has a submitter/user. We will separate the application from 
different users, like different log folders for different applications. 
-Only authenticated users can submit the application to Gearpump's Master.
-
-### Authorization
-Hopefully more on this soon

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/5f90b70f/docs/docs/deployment/deployment-standalone.md
----------------------------------------------------------------------
diff --git a/docs/docs/deployment/deployment-standalone.md 
b/docs/docs/deployment/deployment-standalone.md
deleted file mode 100644
index c9d5549..0000000
--- a/docs/docs/deployment/deployment-standalone.md
+++ /dev/null
@@ -1,59 +0,0 @@
-Standalone mode is a distributed cluster mode. That is, Gearpump runs as 
service without the help from other services (e.g. YARN).
-
-To deploy Gearpump in cluster mode, please first check that the 
[Pre-requisites](hardware-requirement) are met.
-
-### How to Install
-You need to have Gearpump binary at hand. Please refer to [How to get gearpump 
distribution](get-gearpump-distribution) to get the Gearpump binary.
-
-You are suggested to unzip the package to same directory path on every machine 
you planned to install Gearpump.
-To install Gearpump, you at least need to change the configuration in 
`conf/gear.conf`.
-
-Config | Default value | Description
------------- | ---------------|------------
-gearpump.hostname      | "127.0.0.1"    | Host or IP address of current 
machine. The ip/host need to be reachable from other machines in the cluster.
-gearpump.cluster.masters |     ["127.0.0.1:3000"] |    List of all master 
nodes, with each item represents host and port of one master.
-gearpump.worker.slots   | 1000 | how many slots this worker has
-
-Besides this, there are other optional configurations related with logs, 
metrics, transports, ui. You can refer to [Configuration 
Guide](deployment-configuration) for more details.
-
-### Start the Cluster Daemons in Standlone mode
-In Standalone mode, you can start master and worker in different JVMs.
-
-##### To start master:
-
-       :::bash
-       bin/master -ip xx -port xx
-
-The ip and port will be checked against settings under `conf/gear.conf`, so 
you need to make sure they are consistent.
-
-**NOTE:** You may need to execute `chmod +x bin/*` in shell to make the script 
file `master` executable.
-
-**NOTE**: for high availability, please check [Master HA Guide](deployment-ha)
-
-##### To start worker:
-
-       :::bash
-       bin/worker
-
-### Start UI
-
-       :::bash
-       bin/services
-       
-
-After UI is started, you can browse to `http://{web_ui_host}:8090` to view the 
cluster status.
-The default username and password is "admin:admin", you can check
-[UI Authentication](deployment-ui-authentication) to find how to manage users.
-
-![Dashboard](../img/dashboard.gif)
-
-**NOTE:** The UI port can be configured in `gear.conf`. Check [Configuration 
Guide](deployment-configuration) for information.
-
-### Bash tool to start cluster
-
-There is a bash tool `bin/start-cluster.sh` can launch the cluster 
conveniently. You need to change the file `conf/masters`, `conf/workers` and 
`conf/dashboard` to specify the corresponding machines.
-Before running the bash tool, please make sure the Gearpump package is already 
unzipped to the same directory path on every machine.
-`bin/stop-cluster.sh` is used to stop the whole cluster of course.
-
-The bash tool is able to launch the cluster without changing the 
`conf/gear.conf` on every machine. The bash sets the `gearpump.cluster.masters` 
and other configurations using JAVA_OPTS.
-However, please note when you log into any these unconfigured machine and try 
to launch the dashboard or submit the application, you still need to modify 
`conf/gear.conf` manually because the JAVA_OPTS is missing.

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/5f90b70f/docs/docs/deployment/deployment-ui-authentication.md
----------------------------------------------------------------------
diff --git a/docs/docs/deployment/deployment-ui-authentication.md 
b/docs/docs/deployment/deployment-ui-authentication.md
deleted file mode 100644
index 0990192..0000000
--- a/docs/docs/deployment/deployment-ui-authentication.md
+++ /dev/null
@@ -1,290 +0,0 @@
-## What is this about?
-
-## How to enable UI authentication?
-
-1. Change config file gear.conf, find entry 
`gearpump-ui.gearpump.ui-security.authentication-enabled`, change the value to 
true
-
-       :::bash 
-       gearpump-ui.gearpump.ui-security.authentication-enabled = true
-       
-   
-    Restart the UI dashboard, and then the UI authentication is enabled. It 
will prompt for user name and password.
-
-## How many authentication methods Gearpump UI server support?
-
-Currently, It supports:
-
-1. Username-Password based authentication and
-2. OAuth2 based authentication.
-
-User-Password based authentication is enabled when 
`gearpump-ui.gearpump.ui-security.authentication-enabled`,
- and **CANNOT** be disabled.
-
-UI server admin can also choose to enable **auxiliary** OAuth2 authentication 
channel.
-
-## User-Password based authentication
-
-   User-Password based authentication covers all authentication scenarios 
which requires
-   user to enter an explicit username and password.
-
-   Gearpump provides a built-in ConfigFileBasedAuthenticator which verify user 
name and password
-   against password hashcode stored in config files.
-
-   However, developer can choose to extends the 
`org.apache.gearpump.security.Authenticator` to provide a custom
-   User-Password based authenticator, to support LDAP, Kerberos, and 
Database-based authentication...
-
-### ConfigFileBasedAuthenticator: built-in User-Password Authenticator
-
-ConfigFileBasedAuthenticator store all user name and password hashcode in 
configuration file gear.conf. Here
-is the steps to configure ConfigFileBasedAuthenticator.
-
-#### How to add or remove user?
-
-For the default authentication plugin, it has three categories of users: 
admins, users, and guests.
-
-* admins: have unlimited permission, like shutdown a cluster, add/remove 
machines.
-* users: have limited permission to submit an application and etc..
-* guests: can not submit/kill applications, but can view the application 
status.
-
-System administrator can add or remove user by updating config file 
`conf/gear.conf`.  
-
-Suppose we want to add user jerry as an administrator, here are the steps:
-  
-1. Pick a password, and generate the digest for this password. Suppose we use 
password `ilovegearpump`, 
-   to generate the digest:
-   
-       :::bash
-       bin/gear org.apache.gearpump.security.PasswordUtil -password  
ilovegearpump
-    
-   
-    It will generate a digest value like this:
-   
-       :::bash
-       CgGxGOxlU8ggNdOXejCeLxy+isrCv0TrS37HwA==
-    
-
-2. Change config file conf/gear.conf at path 
`gearpump-ui.gearpump.ui-security.config-file-based-authenticator.admins`,
-   add user `jerry` in this list:
-   
-       :::bash
-           admins = {
-             ## Default Admin. Username: admin, password: admin
-             ## !!! Please replace this builtin account for production cluster 
for security reason. !!!
-             "admin" = "AeGxGOxlU8QENdOXejCeLxy+isrCv0TrS37HwA=="
-             "jerry" = "CgGxGOxlU8ggNdOXejCeLxy+isrCv0TrS37HwA=="
-           }
-    
-
-3. Restart the UI dashboard by `bin/services` to make the change effective.
-
-4. Group "admins" have very unlimited permission, you may want to restrict the 
permission. In that case 
-   you can modify 
`gearpump-ui.gearpump.ui-security.config-file-based-authenticator.users` or
-   `gearpump-ui.gearpump.ui-security.config-file-based-authenticator.guests`.
-
-5. See description at `conf/gear.conf` to find more information.   
-   
-#### What is the default user and password?
-
-For ConfigFileBasedAuthenticator, Gearpump distribution is shipped with two 
default users:
-
-1. username: admin, password: admin
-2. username: guest, password: guest
-
-User `admin` has unlimited permissions, while `guest` can only view the 
application status.
-
-For security reason, you need to remove the default users `admin` and `guest` 
for cluster in production.
-
-#### Is this secure?
-
-Firstly, we will NOT store any user password in any way so only the user 
himself knows the password. 
-We will use one-way hash digest to verify the user input password.
-
-### How to develop a custom User-Password Authenticator for LDAP, Database, 
and etc..
-
-If developer choose to define his/her own User-Password based authenticator, 
it is required that user
-    modify configuration option:
-
-       :::bash
-       ## Replace "org.apache.gearpump.security.CustomAuthenticator" with your 
real authenticator class.
-       gearpump.ui-security.authenticator = 
"org.apache.gearpump.security.CustomAuthenticator"
-               
-
-Make sure CustomAuthenticator extends interface:
-
-       :::scala
-       trait Authenticator {
-       
-         def authenticate(user: String, password: String, ec: 
ExecutionContext): Future[AuthenticationResult]
-       }
-
-
-## OAuth2 based authentication
-
-OAuth2 based authentication is commonly use to achieve social login with 
social network account.
-
-Gearpump provides generic OAuth2 Authentication support which allow user to 
extend to support new authentication sources.
-
-Basically, OAuth2 based Authentication contains these steps:
- 1. User accesses Gearpump UI website, and choose to login with OAuth2 server.
- 2. Gearpump UI website redirects user to OAuth2 server domain authorization 
endpoint.
- 3. End user complete the authorization in the domain of OAuth2 server.
- 4. OAuth2 server redirects user back to Gearpump UI server.
- 5. Gearpump UI server verify the tokens and extract credentials from query
- parameters and form fields.
-
-### Terminologies
-
-For terms like client Id, and client secret, please refers to guide [RFC 
6749](https://tools.ietf.org/html/rfc6749)
-
-### Enable web proxy for UI server
-
-To enable OAuth2 authentication, the Gearpump UI server should have network 
access to OAuth2 server, as
- some requests are initiated directly inside Gearpump UI server. So, if you 
are behind a firewall, make
- sure you have configured the proxy properly for UI server.
-
-#### If you are on Windows
-
-       :::bash
-       set JAVA_OPTS=-Dhttp.proxyHost=xx.com -Dhttp.proxyPort=8088 
-Dhttps.proxyHost=xx.com -Dhttps.proxyPort=8088
-       bin/services
-
-
-#### If you are on Linux
-
-       :::bash
-    export JAVA_OPTS="-Dhttp.proxyHost=xx.com -Dhttp.proxyPort=8088 
-Dhttps.proxyHost=xx.com -Dhttps.proxyPort=8088"
-    bin/services
-
-
-### Google Plus OAuth2 Authenticator
-
-Google Plus OAuth2 Authenticator does authentication with Google OAuth2 
service. It extracts the email address
-from Google user profile as credentials.
-
-To use Google OAuth2 Authenticator, there are several steps:
-
-1. Register your application (Gearpump UI server here) as an application to 
Google developer console.
-2. Configure the Google OAuth2 information in gear.conf
-3. Configure network proxy for Gearpump UI server if applies.
-
-#### Step1: Register your website as an OAuth2 Application on Google
-
-1. Create an application representing your website at 
[https://console.developers.google.com](https://console.developers.google.com)
-2. In "API Manager" of your created application, enable API "Google+ API"
-3. Create OAuth client ID for this application. In "Credentials" tab of "API 
Manager",
-choose "Create credentials", and then select OAuth client ID. Follow the wizard
-to set callback URL, and generate client ID, and client Secret.
-
-**NOTE:** Callback URL is NOT optional.
-
-#### Step2: Configure the OAuth2 information in gear.conf
-
-1. Enable OAuth2 authentication by setting 
`gearpump.ui-security.oauth2-authenticator-enabled`
-as true.
-2. Configure section `gearpump.ui-security.oauth2-authenticators.google` in 
gear.conf. Please make sure
-class name, client ID, client Secret, and callback URL are set properly.
-
-**NOTE:** Callback URL set here should match what is configured on Google in 
step1.
-
-#### Step3: Configure the network proxy if applies.
-
-To enable OAuth2 authentication, the Gearpump UI server should have network 
access to Google service, as
- some requests are initiated directly inside Gearpump UI server. So, if you 
are behind a firewall, make
- sure you have configured the proxy properly for UI server.
-
-For guide of how to configure web proxy for UI server, please refer to section 
"Enable web proxy for UI server" above.
-
-#### Step4: Restart the UI server and try to click the Google login icon on UI 
server.
-
-### CloudFoundry UAA server OAuth2 Authenticator
-
-CloudFoundryUaaAuthenticator does authentication by using CloudFoundry UAA 
OAuth2 service. It extracts the email address
- from Google user profile as credentials.
-
-For what is UAA (User Account and Authentication Service), please see guide: 
[UAA](https://github.com/cloudfoundry/uaa)
-
-To use Google OAuth2 Authenticator, there are several steps:
-
-1. Register your application (Gearpump UI server here) as an application to 
UAA with helper tool `uaac`.
-2. Configure the Google OAuth2 information in gear.conf
-3. Configure network proxy for Gearpump UI server if applies.
-
-#### Step1: Register your application to UAA with `uaac`
-
-1. Check tutorial on uaac at 
[https://docs.cloudfoundry.org/adminguide/uaa-user-management.html](https://docs.cloudfoundry.org/adminguide/uaa-user-management.html)
-2. Open a bash shell, set the UAA server by command `uaac target`
-
-           :::bash
-           uaac target [your uaa server url]
-    
-    
-3. Login in as user admin by
-
-           :::bash
-           uaac token client get admin -s MyAdminPassword
-           
-    
-4. Create a new Application (Client) in UAA,
-    
-           :::bash
-           uaac client add [your_client_id]
-             --scope "openid cloud_controller.read"
-             --authorized_grant_types "authorization_code client_credentials 
refresh_token"
-             --authorities "openid cloud_controller.read"
-             --redirect_uri [your_redirect_url]
-             --autoapprove true
-             --secret [your_client_secret]
-           
-
-#### Step2: Configure the OAuth2 information in gear.conf
-
-1. Enable OAuth2 authentication by setting 
`gearpump.ui-security.oauth2-authenticator-enabled` as true.
-2. Navigate to section 
`gearpump.ui-security.oauth2-authenticators.cloudfoundryuaa`
-3. Config gear.conf 
`gearpump.ui-security.oauth2-authenticators.cloudfoundryuaa` section.
-Please make sure class name, client ID, client Secret, and callback URL are 
set properly.
-
-**NOTE:** The callback URL here should match what you set on CloudFoundry UAA 
in step1.
-
-#### Step3: Configure network proxy for Gearpump UI server if applies
-
-To enable OAuth2 authentication, the Gearpump UI server should have network 
access to Google service, as
- some requests are initiated directly inside Gearpump UI server. So, if you 
are behind a firewall, make
- sure you have configured the proxy properly for UI server.
-
-For guide of how to configure web proxy for UI server, please refer to please 
refer to section "Enable web proxy for UI server" above.
-
-#### Step4: Restart the UI server and try to click the CloudFoundry login icon 
on UI server.
-
-#### Step5: You can also enable additional authenticator for CloudFoundry UAA 
by setting config:
-
-       :::bash
-       additional-authenticator-enabled = true
-       
-
-Please see description in gear.conf for more information.
-
-#### Extends OAuth2Authenticator to support new Authorization service like 
Facebook, or Twitter.
-
-You can follow the Google OAuth2 example code to define a custom 
OAuth2Authenticator. Basically, the steps includes:
-
-1. Define an OAuth2Authenticator implementation.
-
-2. Add an configuration entry under 
`gearpump.ui-security.oauth2-authenticators`. For example:
-
-               ## name of this authenticator     
-       "socialnetworkx" {
-         "class" = 
"org.apache.gearpump.services.security.oauth2.impl.SocialNetworkXAuthenticator"
-
-         ## Please make sure this URL matches the name       
-         "callback" = 
"http://127.0.0.1:8090/login/oauth2/socialnetworkx/callback";
-
-         "clientId" = "gearpump_test2"
-         "clientSecret" = "gearpump_test2"
-         "defaultUserRole" = "guest"
-
-         ## Make sure socialnetworkx.png exists under dashboard/icons          
       
-         "icon" = "/icons/socialnetworkx.png"
-       }           
-       
-    
-   The configuration entry is supposed to be used by class 
`SocialNetworkXAuthenticator`.
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/5f90b70f/docs/docs/deployment/deployment-yarn.md
----------------------------------------------------------------------
diff --git a/docs/docs/deployment/deployment-yarn.md 
b/docs/docs/deployment/deployment-yarn.md
deleted file mode 100644
index 401fa46..0000000
--- a/docs/docs/deployment/deployment-yarn.md
+++ /dev/null
@@ -1,135 +0,0 @@
-## How to launch a Gearpump cluster on YARN
-
-1. Upload the `gearpump-{{SCALA_BINARY_VERSION}}-{{GEARPUMP_VERSION}}.zip` to 
remote HDFS Folder, suggest to put it under 
`/usr/lib/gearpump/gearpump-{{SCALA_BINARY_VERSION}}-{{GEARPUMP_VERSION}}.zip`
-
-2. Make sure the home directory on HDFS is already created and all read-write 
rights are granted for user. For example, user gear's home directory is 
`/user/gear`
-
-3. Put the YARN configurations under classpath.
-  Before calling `yarnclient launch`, make sure you have put all yarn 
configuration files under classpath. Typically, you can just copy all files 
under `$HADOOP_HOME/etc/hadoop` from one of the YARN Cluster machine to 
`conf/yarnconf` of gearpump. `$HADOOP_HOME` points to the Hadoop installation 
directory. 
-
-4. Launch the gearpump cluster on YARN
-
-               :::bash
-       yarnclient launch -package 
/usr/lib/gearpump/gearpump-{{SCALA_BINARY_VERSION}}-{{GEARPUMP_VERSION}}.zip
-    
-
-    If you don't specify package path, it will read default package-path 
(`gearpump.yarn.client.package-path`) from `gear.conf`.
-
-    **NOTE:** You may need to execute `chmod +x bin/*` in shell to make the 
script file `yarnclient` executable.
-   
-5. After launching, you can browser the Gearpump UI via YARN resource manager 
dashboard.
-
-## How to configure the resource limitation of Gearpump cluster
-
-Before launching a Gearpump cluster, please change configuration section 
`gearpump.yarn` in `gear.conf` to configure the resource limitation, like:
-
-1. The number of worker containers. 
-2. The YARN container memory size for worker and master.
-
-## How to submit a application to Gearpump cluster.
-
-To submit the jar to the Gearpump cluster, we first need to know the Master 
address, so we need to get
-a active configuration file first.
-
-There are two ways to get an active configuration file:
-
-1. Option 1: specify "-output" option when you launch the cluster.
-
-       :::bash
-       yarnclient launch -package 
/usr/lib/gearpump/gearpump-{{SCALA_BINARY_VERSION}}-{{GEARPUMP_VERSION}}.zip 
-output /tmp/mycluster.conf
-    
-
-    It will return in console like this:
-    
-       :::bash
-       ==Application Id: application_1449802454214_0034
-    
-    
-
-2. Option 2: Query the active configuration file
-
-       :::bash
-       yarnclient getconfig -appid <yarn application id> -output 
/tmp/mycluster.conf
-    
-
-    yarn application id can be found from the output of step1 or from YARN 
dashboard.
-
-3. After you downloaded the configuration file, you can launch application 
with that config file.
-
-       :::bash
-       gear app -jar 
examples/wordcount-{{SCALA_BINARY_VERSION}}-{{GEARPUMP_VERSION}}.jar -conf 
/tmp/mycluster.conf
-    
-  
-4. To run Storm application over Gearpump on YARN, please store the 
configuration file with `-output application.conf` 
-   and then launch Storm application with
-
-       :::bash
-       storm -jar 
examples/storm-{{SCALA_BINARY_VERSION}}-{{GEARPUMP_VERSION}}.jar 
storm.starter.ExclamationTopology exclamation
-    
-  
-5. Now the application is running. To check this:
-
-       :::bash
-       gear info -conf /tmp/mycluster.conf
-    
-
-6. To Start a UI server, please do:
-    
-       :::bash
-       services -conf /tmp/mycluster.conf
-    
-    
-    The default username and password is "admin:admin", you can check [UI 
Authentication](deployment-ui-authentication) to find how to manage users.
-
-   
-## How to add/remove machines dynamically.
-
-Gearpump yarn tool allows to dynamically add/remove machines. Here is the 
steps:
-
-1. First, query to get active resources.
-
-       :::bash
-       yarnclient query -appid <yarn application id>
-    
-
-    The console output will shows how many workers and masters there are. For 
example, I have output like this:
-
-       :::bash
-       masters:
-       container_1449802454214_0034_01_000002(IDHV22-01:35712)
-       workers:
-       container_1449802454214_0034_01_000003(IDHV22-01:35712)
-       container_1449802454214_0034_01_000006(IDHV22-01:35712)
-   
-
-2. To add a new worker machine, you can do:
-
-       :::bash
-       yarnclient addworker -appid <yarn application id> -count 2
-    
-
-    This will add two new workers machines. Run the command in first step to 
check whether the change is effective.
-
-3. To remove old machines, use:
-
-       :::bash
-       yarnclient removeworker -appid <yarn application id> -container <worker 
container id>
-    
-
-    The worker container id can be found from the output of step 1. For 
example "container_1449802454214_0034_01_000006" is a good container id.
-
-## Other usage:
- 
-1. To kill a cluster,
-
-       :::bash
-       yarnclient kill -appid <yarn application id>
-    
-
-    **NOTE:** If the application is not launched successfully, then this 
command won't work. Please use "yarn application -kill <appId>" instead.
-
-2. To check the Gearpump version
-
-       :::bash
-       yarnclient version -appid <yarn application id>
-    

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/5f90b70f/docs/docs/deployment/get-gearpump-distribution.md
----------------------------------------------------------------------
diff --git a/docs/docs/deployment/get-gearpump-distribution.md 
b/docs/docs/deployment/get-gearpump-distribution.md
deleted file mode 100644
index 8a31113..0000000
--- a/docs/docs/deployment/get-gearpump-distribution.md
+++ /dev/null
@@ -1,83 +0,0 @@
-### Prepare the binary
-You can either download pre-build release package or choose to build from 
source code.
-
-#### Download Release Binary
-
-If you choose to use pre-build package, then you don't need to build from 
source code. The release package can be downloaded from:
-
-##### [Download page](http://gearpump.incubator.apache.org/downloads.html)
-
-#### Build from Source code
-
-If you choose to build the package from source code yourself, you can follow 
these steps:
-
-1). Clone the Gearpump repository
-
-       :::bash
-       git clone https://github.com/apache/incubator-gearpump.git
-  cd gearpump
-
-
-2). Build package
-
-       :::bash
-       ## Please use scala 2.11
-       ## The target package path: 
output/target/gearpump-{{SCALA_BINARY_VERSION}}-{{GEARPUMP_VERSION}}.zip
-       sbt clean assembly packArchiveZip
-
-
-  After the build, there will be a package file 
gearpump-{{SCALA_BINARY_VERSION}}-{{GEARPUMP_VERSION}}.zip generated under 
output/target/ folder.
-
-  **NOTE:**
-  Please set JAVA_HOME environment before the build.
-
-  On linux:
-
-       :::bash
-       export JAVA_HOME={path/to/jdk/root/path}
-       
-
-  On Windows:
-
-       :::bash
-       set JAVA_HOME={path/to/jdk/root/path}
-
-
-  **NOTE:**
-The build requires network connection. If you are behind an enterprise proxy, 
make sure you have set the proxy in your env before running the build commands.
-For windows:
-
-       :::bash
-       set HTTP_PROXY=http://host:port
-       set HTTPS_PROXY= http://host:port
-
-
-For Linux:
-
-       :::bash
-       export HTTP_PROXY=http://host:port
-       export HTTPS_PROXY= http://host:port
-
-
-### Gearpump package structure
-
-You need to flatten the `.zip` file to use it. On Linux, you can
-
-       :::bash
-       unzip gearpump-{{SCALA_BINARY_VERSION}}-{{GEARPUMP_VERSION}}.zip
-
-
-After decompression, the directory structure looks like picture 1.
-
-![Layout](../img/layout.png)
-
-Under bin/ folder, there are script files for Linux(bash script) and 
Windows(.bat script).
-
-script | function
---------|------------
-local | You can start the Gearpump cluster in single JVM(local mode), or in a 
distributed cluster(cluster mode). To start the cluster in local mode, you can 
use the local /local.bat helper scripts, it is very useful for developing or 
troubleshooting.
-master | To start Gearpump in cluster mode, you need to start one or more 
master nodes, which represent the global resource management center. 
master/master.bat is launcher script to boot the master node.
-worker | To start Gearpump in cluster mode, you also need to start several 
workers, with each worker represent a set of local resources. worker/worker.bat 
is launcher script to start the worker node.
-services | This script is used to start backend REST service and other 
services for frontend UI dashboard (Default user "admin, admin").
-
-Please check [Command Line Syntax](../introduction/commandline) for more 
information for each script.

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/5f90b70f/docs/docs/deployment/hardware-requirement.md
----------------------------------------------------------------------
diff --git a/docs/docs/deployment/hardware-requirement.md 
b/docs/docs/deployment/hardware-requirement.md
deleted file mode 100644
index dfe765b..0000000
--- a/docs/docs/deployment/hardware-requirement.md
+++ /dev/null
@@ -1,30 +0,0 @@
-### Pre-requisite
-
-Gearpump cluster can be installed on Windows OS and Linux.
-
-Before installation, you need to decide how many machines are used to run this 
cluster.
-
-For each machine, the requirements are listed in table below.
-
-**Table: Environment requirement on single machine**
-
-Resource | Requirements
------------- | ---------------------------
-Memory       | 2GB free memory is required to run the cluster. For any 
production system, 32GB memory is recommended.
-Java          | JRE 6 or above
-User permission | Root permission is not required
-Network        Ethernet |(TCP/IP)
-CPU    | Nothing special
-HDFS installation      | Default is not required. You only need to install it 
when you want to store the application jars in HDFS.
-Kafka installation |   Default is not required. You need to install Kafka when 
you want the at-least once message delivery feature. Currently, the only 
supported data source for this feature is Kafka
-
-**Table: The default port used in Gearpump:**
-
-| usage        | Port |        Description |
------------- | ---------------|------------
-  Dashboard UI | 8090  | Web UI.
-Dashboard web socket service | 8091 |  UI backend web socket service for long 
connection.
-Master port |  3000 |  Every other role like worker, appmaster, executor, user 
use this port to communicate with Master.
-
-You need to ensure that your firewall has not banned these ports to ensure 
Gearpump can work correctly.
-And you can modify the port configuration. Check 
[Configuration](deployment-configuration) section for details.  

http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/5f90b70f/docs/docs/dev/dev-connectors.md
----------------------------------------------------------------------
diff --git a/docs/docs/dev/dev-connectors.md b/docs/docs/dev/dev-connectors.md
deleted file mode 100644
index 01deb3e..0000000
--- a/docs/docs/dev/dev-connectors.md
+++ /dev/null
@@ -1,237 +0,0 @@
-## Basic Concepts
-`DataSource` and `DataSink` are the two main concepts Gearpump use to connect 
with the outside world.
-
-### DataSource
-`DataSource` is the start point of a streaming processing flow. 
-
-
-### DataSink
-`DataSink` is the end point of a streaming processing flow.
-
-## Implemented Connectors
-
-### `DataSource` implemented
-Currently, we have following `DataSource` supported.
-
-Name | Description
------| ----------
-`CollectionDataSource` | Convert a collection to a recursive data source. E.g. 
`seq(1, 2, 3)` will output `1,2,3,1,2,3...`.
-`KafkaSource` | Read from Kafka.
-
-### `DataSink` implemented
-Currently, we have following `DataSink` supported.
-
-Name | Description
------| ----------
-`HBaseSink` | Write the message to HBase. The message to write must be HBase 
`Put` or a tuple of `(rowKey, family, column, value)`.
-`KafkaSink` | Write to Kafka.
-
-## Use of Connectors
-
-### Use of Kafka connectors
-
-To use Kafka connectors in your application, you first need to add the 
`gearpump-external-kafka` library dependency in your application:
-
-#### SBT
-
-       :::sbt
-       "org.apache.gearpump" %% "gearpump-external-kafka" % 
{{GEARPUMP_VERSION}}
-
-#### XML
-
-       :::xml
-       <dependency>
-         <groupId>org.apache.gearpump</groupId>
-         <artifactId>gearpump-external-kafka</artifactId>
-         <version>{{GEARPUMP_VERSION}}</version>
-       </dependency>
-
-
-This is a simple example to read from Kafka and write it back using 
`KafkaSource` and `KafkaSink`. Users can optionally set a 
`CheckpointStoreFactory` such that Kafka offsets are checkpointed and 
at-least-once message delivery is guaranteed. 
-
-#### Low level API 
-
-       :::scala
-       val appConfig = UserConfig.empty
-       val props = new Properties
-       props.put(KafkaConfig.ZOOKEEPER_CONNECT_CONFIG, zookeeperConnect)
-       props.put(KafkaConfig.BOOTSTRAP_SERVERS_CONFIG, brokerList)
-       props.put(KafkaConfig.CHECKPOINT_STORE_NAME_PREFIX_CONFIG, appName)
-       val source = new KafkaSource(sourceTopic, props)
-       val checkpointStoreFactory = new KafkaStoreFactory(props)
-       source.setCheckpointStore(checkpointStoreFactory)
-       val sourceProcessor = DataSourceProcessor(source, sourceNum)
-       val sink = new KafkaSink(sinkTopic, props)
-       val sinkProcessor = DataSinkProcessor(sink, sinkNum)
-       val partitioner = new ShufflePartitioner
-       val computation = sourceProcessor ~ partitioner ~> sinkProcessor
-       val app = StreamApplication(appName, Graph(computation), appConfig)
-
-#### High level API
-
-       :::scala
-       val props = new Properties
-       val appName = "KafkaDSL"
-       props.put(KafkaConfig.ZOOKEEPER_CONNECT_CONFIG, zookeeperConnect)
-       props.put(KafkaConfig.BOOTSTRAP_SERVERS_CONFIG, brokerList)
-       props.put(KafkaConfig.CHECKPOINT_STORE_NAME_PREFIX_CONFIG, appName)
-               
-       val app = StreamApp(appName, context)
-               
-       if (atLeastOnce) {
-         val checkpointStoreFactory = new KafkaStoreFactory(props)
-         KafkaDSL.createAtLeastOnceStream(app, sourceTopic, 
checkpointStoreFactory, props, sourceNum)
-             .writeToKafka(sinkTopic, props, sinkNum)
-       } else {
-         KafkaDSL.createAtMostOnceStream(app, sourceTopic, props, sourceNum)
-             .writeToKafka(sinkTopic, props, sinkNum)
-       }
-               
-
-In the above example, configurations are set through Java properties and 
shared by `KafkaSource`, `KafkaSink` and `KafkaCheckpointStoreFactory`.
-Their configurations can be defined differently as below. 
-
-#### `KafkaSource` configurations
-
-Name | Descriptions | Type | Default 
----- | ------------ | ---- | -------
-`KafkaConfig.ZOOKEEPER_CONNECT_CONFIG` | Zookeeper connect string for Kafka 
topics management | String 
-`KafkaConfig.CLIENT_ID_CONFIG` | An id string to pass to the server when 
making requests | String | ""  
-`KafkaConfig.GROUP_ID_CONFIG` | A string that uniquely identifies a set of 
consumers within the same consumer group | "" 
-`KafkaConfig.FETCH_SLEEP_MS_CONFIG` | The amount of time(ms) to sleep when 
hitting fetch.threshold | Int | 100 
-`KafkaConfig.FETCH_THRESHOLD_CONFIG` | Size of internal queue to keep Kafka 
messages. Stop fetching and go to sleep when hitting the threshold | Int | 
10000 
-`KafkaConfig.PARTITION_GROUPER_CLASS_CONFIG` | Partition grouper class to 
group partitions among source tasks |  Class | DefaultPartitionGrouper 
-`KafkaConfig.MESSAGE_DECODER_CLASS_CONFIG` | Message decoder class to decode 
raw bytes from Kafka | Class | DefaultMessageDecoder 
-`KafkaConfig.TIMESTAMP_FILTER_CLASS_CONFIG` | Timestamp filter class to filter 
out late messages | Class | DefaultTimeStampFilter 
-
-
-#### `KafkaSink` configurations
-
-Name | Descriptions | Type | Default 
----- | ------------ | ---- | ------- 
-`KafkaConfig.BOOTSTRAP_SERVERS_CONFIG` | A list of host/port pairs to use for 
establishing the initial connection to the Kafka cluster | String |  
-`KafkaConfig.CLIENT_ID_CONFIG` | An id string to pass to the server when 
making requests | String | ""  
-
-#### `KafkaCheckpointStoreFactory` configurations
-
-Name | Descriptions | Type | Default 
----- | ------------ | ---- | ------- 
-`KafkaConfig.ZOOKEEPER_CONNECT_CONFIG` | Zookeeper connect string for Kafka 
topics management | String | 
-`KafkaConfig.BOOTSTRAP_SERVERS_CONFIG` | A list of host/port pairs to use for 
establishing the initial connection to the Kafka cluster | String | 
-`KafkaConfig.CHECKPOINT_STORE_NAME_PREFIX` | Name prefix for checkpoint store 
| String | "" 
-`KafkaConfig.REPLICATION_FACTOR` | Replication factor for checkpoint store 
topic | Int | 1 
-
-### Use of `HBaseSink`
-
-To use `HBaseSink` in your application, you first need to add the 
`gearpump-external-hbase` library dependency in your application:
-
-#### SBT
-
-       :::sbt
-       "org.apache.gearpump" %% "gearpump-external-hbase" % 
{{GEARPUMP_VERSION}}
-
-#### XML
-       :::xml
-       <dependency>
-         <groupId>org.apache.gearpump</groupId>
-         <artifactId>gearpump-external-hbase</artifactId>
-         <version>{{GEARPUMP_VERSION}}</version>
-       </dependency>
-
-
-To connect to HBase, you need to provide following info:
-  
-  * the HBase configuration to tell which HBase service to connect
-  * the table name (you must create the table yourself, see the [HBase 
documentation](https://hbase.apache.org/book.html))
-
-Then, you can use `HBaseSink` in your application:
-
-       :::scala
-       //create the HBase data sink
-       val sink = HBaseSink(UserConfig.empty, tableName, 
HBaseConfiguration.create())
-       
-       //create Gearpump Processor
-       val sinkProcessor = DataSinkProcessor(sink, parallelism)
-
-
-       :::scala
-       //assume stream is a normal `Stream` in DSL
-       stream.writeToHbase(UserConfig.empty, tableName, parallelism, "write to 
HBase")
-
-
-You can tune the connection to HBase via the HBase configuration passed in. If 
not passed, Gearpump will try to check local classpath to find a valid HBase 
configuration (`hbase-site.xml`).
-
-Attention, due to the issue discussed 
[here](http://stackoverflow.com/questions/24456484/hbase-managed-zookeeper-suddenly-trying-to-connect-to-localhost-instead-of-zooke)
 you may need to create additional configuration for your HBase sink:
-
-       :::scala
-       def hadoopConfig = {
-        val conf = new Configuration()
-        conf.set("hbase.zookeeper.quorum", "zookeeperHost")
-        conf.set("hbase.zookeeper.property.clientPort", "2181")
-        conf
-       }
-       val sink = HBaseSink(UserConfig.empty, tableName, hadoopConfig)
-
-
-## How to implement your own `DataSource`
-
-To implement your own `DataSource`, you need to implement two things:
-
-1. The data source itself
-2. a helper class to easy the usage in a DSL
-
-### Implement your own `DataSource`
-You need to implement a class derived from 
`org.apache.gearpump.streaming.transaction.api.TimeReplayableSource`.
-
-### Implement DSL helper (Optional)
-If you would like to have a DSL at hand you may start with this customized 
stream; it is better if you can implement your own DSL helper.
-You can refer `KafkaDSLUtil` as an example in Gearpump source.
-
-Below is some code snippet from `KafkaDSLUtil`:
-
-       :::scala
-       object KafkaDSLUtil {
-       
-         def createStream[T](
-             app: StreamApp,
-             topics: String,
-             parallelism: Int,
-             description: String,
-             properties: Properties): dsl.Stream[T] = {
-           app.source[T](new KafkaSource(topics, properties), parallelism, 
description)
-         }
-       }
-
-
-## How to implement your own `DataSink`
-To implement your own `DataSink`, you need to implement two things:
-
-1. The data sink itself
-2. a helper class to make it easy use in DSL
-
-### Implement your own `DataSink`
-You need to implement a class derived from 
`org.apache.gearpump.streaming.sink.DataSink`.
-
-### Implement DSL helper (Optional)
-If you would like to have a DSL at hand you may start with this customized 
stream; it is better if you can implement your own DSL helper.
-You can refer `HBaseDSLSink` as an example in Gearpump source.
-
-Below is some code snippet from `HBaseDSLSink`:
-
-       :::scala
-       class HBaseDSLSink[T](stream: Stream[T]) {
-         def writeToHbase(userConfig: UserConfig, table: String, parallism: 
Int, description: String): Stream[T] = {
-           stream.sink(HBaseSink[T](userConfig, table), parallism, userConfig, 
description)
-         }
-         
-         def writeToHbase(userConfig: UserConfig, configuration: 
Configuration, table: String, parallism: Int, description: String): Stream[T] = 
{
-           stream.sink(HBaseSink[T](userConfig, table, configuration), 
parallism, userConfig, description)
-         }  
-       }
-       
-       object HBaseDSLSink {
-         implicit def streamToHBaseDSLSink[T](stream: Stream[T]): 
HBaseDSLSink[T] = {
-           new HBaseDSLSink[T](stream)
-         }
-       }
-

[3/5] incubator-gearpump git commit: [GEARPUMP-213] Fix EditOnGitHub link and rename docs/docs to docs/con…

Reply via email to