[15/31] storm git commit: STORM-1617: 0.10.x release docs

bobby Tue, 22 Mar 2016 08:32:34 -0700

STORM-1617: 0.10.x release docs

Conflicts:
        docs/images/topology.png



Project: http://git-wip-us.apache.org/repos/asf/storm/repo
Commit: http://git-wip-us.apache.org/repos/asf/storm/commit/97f0558e
Tree: http://git-wip-us.apache.org/repos/asf/storm/tree/97f0558e
Diff: http://git-wip-us.apache.org/repos/asf/storm/diff/97f0558e

Branch: refs/heads/1.x-branch
Commit: 97f0558e13c5ab7aecc30d7cc6b88fb57aa1e98e
Parents: d491c3f
Author: Robert (Bobby) Evans <[email protected]>
Authored: Sat Mar 19 12:33:31 2016 -0500
Committer: Robert (Bobby) Evans <[email protected]>
Committed: Sat Mar 19 12:41:23 2016 -0500

----------------------------------------------------------------------
 docs/Clojure-DSL.md                             |   8 +-
 docs/Command-line-client.md                     |   8 +-
 docs/Common-patterns.md                         |  18 +-
 docs/Concepts.md                                |  26 +-
 docs/Configuration.md                           |   6 +-
 docs/Contributing-to-Storm.md                   |   4 +-
 docs/Creating-a-new-Storm-project.md            |   8 +-
 docs/DSLs-and-multilang-adapters.md             |   5 +-
 docs/Daemon-Fault-Tolerance.md                  |  30 +
 ...Defining-a-non-jvm-language-dsl-for-storm.md |   4 +-
 docs/Distributed-RPC.md                         |   4 +-
 docs/FAQ.md                                     |  14 +-
 docs/Guaranteeing-message-processing.md         |  10 +-
 docs/Hooks.md                                   |   4 +-
 docs/Implementation-docs.md                     |   3 +-
 docs/Kestrel-and-Storm.md                       |   4 +-
 docs/Lifecycle-of-a-topology.md                 |  72 +-
 docs/Local-mode.md                              |   2 +
 docs/Maven.md                                   |  52 +-
 docs/Message-passing-implementation.md          |  36 +-
 docs/Metrics.md                                 |  20 +-
 docs/Multilang-protocol.md                      |  76 +-
 docs/Rationale.md                               |   2 +
 ...unning-topologies-on-a-production-cluster.md |   4 +-
 docs/SECURITY.md                                | 459 +++++++++-
 docs/STORM-UI-REST-API.md                       |   2 +-
 docs/Serialization.md                           |   8 +-
 docs/Setting-up-a-Storm-cluster.md              |  46 +-
 docs/Setting-up-development-environment.md      |  16 +-
 docs/Spout-implementations.md                   |   2 +
 docs/Structure-of-the-codebase.md               |  92 +-
 docs/Support-for-non-java-languages.md          |   2 +
 docs/Transactional-topologies.md                |  14 +-
 docs/Trident-API-Overview.md                    | 228 ++++-
 docs/Trident-spouts.md                          |  10 +-
 docs/Trident-state.md                           |  15 +-
 docs/Trident-tutorial.md                        |   5 +-
 docs/Troubleshooting.md                         |  40 +-
 docs/Tutorial.md                                |  22 +-
 ...nding-the-parallelism-of-a-Storm-topology.md |  30 +-
 docs/Using-non-JVM-languages-with-Storm.md      |   1 +
 docs/flux.md                                    | 835 +++++++++++++++++++
 docs/images/topology.png                        | Bin 0 -> 23147 bytes
 docs/index.md                                   | 118 ++-
 docs/storm-eventhubs.md                         |  40 +
 docs/storm-hbase.md                             | 241 ++++++
 docs/storm-hdfs.md                              | 368 ++++++++
 docs/storm-hive.md                              | 111 +++
 docs/storm-jdbc.md                              | 285 +++++++
 docs/storm-kafka.md                             | 287 +++++++
 docs/storm-redis.md                             | 258 ++++++
 51 files changed, 3599 insertions(+), 356 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/storm/blob/97f0558e/docs/Clojure-DSL.md
----------------------------------------------------------------------
diff --git a/docs/Clojure-DSL.md b/docs/Clojure-DSL.md
index b3109fa..234febe 100644
--- a/docs/Clojure-DSL.md
+++ b/docs/Clojure-DSL.md
@@ -1,7 +1,9 @@
 ---
+title: Clojure DSL
 layout: documentation
+documentation: true
 ---
-Storm comes with a Clojure DSL for defining spouts, bolts, and topologies. The 
Clojure DSL has access to everything the Java API exposes, so if you're a 
Clojure user you can code Storm topologies without touching Java at all. The 
Clojure DSL is defined in the source in the 
[backtype.storm.clojure](https://github.com/apache/incubator-storm/blob/0.5.3/src/clj/backtype/storm/clojure.clj)
 namespace.
+Storm comes with a Clojure DSL for defining spouts, bolts, and topologies. The 
Clojure DSL has access to everything the Java API exposes, so if you're a 
Clojure user you can code Storm topologies without touching Java at all. The 
Clojure DSL is defined in the source in the 
[backtype.storm.clojure]({{page.git-blob-base}}/storm-core/src/clj/backtype/storm/clojure.clj)
 namespace.
 
 This page outlines all the pieces of the Clojure DSL, including:
 
@@ -15,7 +17,7 @@ This page outlines all the pieces of the Clojure DSL, 
including:
 
 To define a topology, use the `topology` function. `topology` takes in two 
arguments: a map of "spout specs" and a map of "bolt specs". Each spout and 
bolt spec wires the code for the component into the topology by specifying 
things like inputs and parallelism.
 
-Let's take a look at an example topology definition [from the storm-starter 
project](https://github.com/nathanmarz/storm-starter/blob/master/src/clj/storm/starter/clj/word_count.clj):
+Let's take a look at an example topology definition [from the storm-starter 
project]({{page.git-blob-base}}/examples/storm-starter/src/clj/storm/starter/clj/word_count.clj):
 
 ```clojure
 (topology
@@ -201,7 +203,7 @@ The signature for `defspout` looks like the following:
 
 If you leave out the option map, it defaults to {:prepare true}. The output 
declaration for `defspout` has the same syntax as `defbolt`.
 
-Here's an example `defspout` implementation from 
[storm-starter](https://github.com/nathanmarz/storm-starter/blob/master/src/clj/storm/starter/clj/word_count.clj):
+Here's an example `defspout` implementation from 
[storm-starter]({{page.git-blob-base}}/examples/storm-starter/src/clj/storm/starter/clj/word_count.clj):
 
 ```clojure
 (defspout sentence-spout ["sentence"]

http://git-wip-us.apache.org/repos/asf/storm/blob/97f0558e/docs/Command-line-client.md
----------------------------------------------------------------------
diff --git a/docs/Command-line-client.md b/docs/Command-line-client.md
index 0e645d7..4634467 100644
--- a/docs/Command-line-client.md
+++ b/docs/Command-line-client.md
@@ -1,7 +1,9 @@
 ---
+title: Command Line Client
 layout: documentation
+documentation: true
 ---
-This page describes all the commands that are possible with the "storm" 
command line client. To learn how to set up your "storm" client to talk to a 
remote cluster, follow the instructions in [Setting up development 
environment](Setting-up-a-development-environment.html).
+This page describes all the commands that are possible with the "storm" 
command line client. To learn how to set up your "storm" client to talk to a 
remote cluster, follow the instructions in [Setting up development 
environment](Setting-up-development-environment.html).
 
 These commands are:
 
@@ -45,12 +47,14 @@ Deactivates the specified topology's spouts.
 
 ### rebalance
 
-Syntax: `storm rebalance topology-name [-w wait-time-secs]`
+Syntax: `storm rebalance topology-name [-w wait-time-secs] [-n 
new-num-workers] [-e component=parallelism]*`
 
 Sometimes you may wish to spread out where the workers for a topology are 
running. For example, let's say you have a 10 node cluster running 4 workers 
per node, and then let's say you add another 10 nodes to the cluster. You may 
wish to have Storm spread out the workers for the running topology so that each 
node runs 2 workers. One way to do this is to kill the topology and resubmit 
it, but Storm provides a "rebalance" command that provides an easier way to do 
this. 
 
 Rebalance will first deactivate the topology for the duration of the message 
timeout (overridable with the -w flag) and then redistribute the workers evenly 
around the cluster. The topology will then return to its previous state of 
activation (so a deactivated topology will still be deactivated and an 
activated topology will go back to being activated).
 
+The rebalance command can also be used to change the parallelism of a running 
topology. Use the -n and -e switches to change the number of workers or number 
of executors of a component respectively.
+
 ### repl
 
 Syntax: `storm repl`

http://git-wip-us.apache.org/repos/asf/storm/blob/97f0558e/docs/Common-patterns.md
----------------------------------------------------------------------
diff --git a/docs/Common-patterns.md b/docs/Common-patterns.md
index 3f8c979..1c97f6d 100644
--- a/docs/Common-patterns.md
+++ b/docs/Common-patterns.md
@@ -1,5 +1,7 @@
 ---
+title: Common Topology Patterns
 layout: documentation
+documentation: true
 ---
 
 This page lists a variety of common patterns in Storm topologies.
@@ -62,14 +64,26 @@ A common continuous computation done on Storm is a 
"streaming top N" of some sor
 This approach obviously doesn't scale to large streams since the entire stream 
has to go through one task. A better way to do the computation is to do many 
top N's in parallel across partitions of the stream, and then merge those top 
N's together to get the global top N. The pattern looks like this:
 
 ```java
-builder.setBolt("rank", new RankObjects(), parallellism)
+builder.setBolt("rank", new RankObjects(), parallelism)
   .fieldsGrouping("objects", new Fields("value"));
 builder.setBolt("merge", new MergeObjects())
   .globalGrouping("rank");
 ```
 
-This pattern works because of the fields grouping done by the first bolt which 
gives the partitioning you need for this to be semantically correct. You can 
see an example of this pattern in storm-starter 
[here](https://github.com/nathanmarz/storm-starter/blob/master/src/jvm/storm/starter/RollingTopWords.java).
+This pattern works because of the fields grouping done by the first bolt which 
gives the partitioning you need for this to be semantically correct. You can 
see an example of this pattern in storm-starter 
[here]({{page.git-blob-base}}/examples/storm-starter/src/jvm/storm/starter/RollingTopWords.java).
 
+If however you have a known skew in the data being processed it can be 
advantageous to use partialKeyGrouping instead of fieldsGrouping.  This will 
distribute the load for each key between two downstream bolts instead of a 
single one.
+
+```java
+builder.setBolt("count", new CountObjects(), parallelism)
+  .partialKeyGrouping("objects", new Fields("value"));
+builder.setBolt("rank" new AggregateCountsAndRank(), parallelism)
+  .fieldsGrouping("count", new Fields("key"))
+builder.setBolt("merge", new MergeRanksObjects())
+  .globalGrouping("rank");
+``` 
+
+The topology needs an extra layer of processing to aggregate the partial 
counts from the upstream bolts but this only processes aggregated values now so 
the bolt it is not subject to the load caused by the skewed data. You can see 
an example of this pattern in storm-starter 
[here]({{page.git-blob-base}}/examples/storm-starter/src/jvm/storm/starter/SkewedRollingTopWords.java).
 
 ### TimeCacheMap for efficiently keeping a cache of things that have been 
recently updated
 

http://git-wip-us.apache.org/repos/asf/storm/blob/97f0558e/docs/Concepts.md
----------------------------------------------------------------------
diff --git a/docs/Concepts.md b/docs/Concepts.md
index 33779f2..01dbd11 100644
--- a/docs/Concepts.md
+++ b/docs/Concepts.md
@@ -1,5 +1,7 @@
 ---
+title: Concepts
 layout: documentation
+documentation: true
 ---
 
 This page lists the main concepts of Storm and links to resources where you 
can find more information. The concepts discussed are:
@@ -35,14 +37,12 @@ Every stream is given an id when declared. Since 
single-stream spouts and bolts
 * [Tuple](javadocs/backtype/storm/tuple/Tuple.html): streams are composed of 
tuples
 * 
[OutputFieldsDeclarer](javadocs/backtype/storm/topology/OutputFieldsDeclarer.html):
 used to declare streams and their schemas
 * [Serialization](Serialization.html): Information about Storm's dynamic 
typing of tuples and declaring custom serializations
-* [ISerialization](javadocs/backtype/storm/serialization/ISerialization.html): 
custom serializers must implement this interface
-* 
[CONFIG.TOPOLOGY_SERIALIZATIONS](javadocs/backtype/storm/Config.html#TOPOLOGY_SERIALIZATIONS):
 custom serializers can be registered using this configuration
 
 ### Spouts
 
 A spout is a source of streams in a topology. Generally spouts will read 
tuples from an external source and emit them into the topology (e.g. a Kestrel 
queue or the Twitter API). Spouts can either be __reliable__ or __unreliable__. 
A reliable spout is capable of replaying a tuple if it failed to be processed 
by Storm, whereas an unreliable spout forgets about the tuple as soon as it is 
emitted.
 
-Spouts can emit more than one stream. To do so, declare multiple streams using 
the `declareStream` method of 
[OutputFieldsDeclarer](javadocs/backtype/storm/topology/OutputFieldsDeclarer.html)
 and specify the stream to emit to when using the `emit` method on 
[SpoutOutputCollector](javadocs/backtype/storm/spout/SpoutOutputCollector.html).
 
+Spouts can emit more than one stream. To do so, declare multiple streams using 
the `declareStream` method of 
[OutputFieldsDeclarer](javadocs/backtype/storm/topology/OutputFieldsDeclarer.html)
 and specify the stream to emit to when using the `emit` method on 
[SpoutOutputCollector](javadocs/backtype/storm/spout/SpoutOutputCollector.html).
 
 The main method on spouts is `nextTuple`. `nextTuple` either emits a new tuple 
into the topology or simply returns if there are no new tuples to emit. It is 
imperative that `nextTuple` does not block for any spout implementation, 
because Storm calls all the spout methods on the same thread.
 
@@ -50,7 +50,7 @@ The other main methods on spouts are `ack` and `fail`. These 
are called when Sto
 
 **Resources:**
 
-* [IRichSpout](javadocs/backtype/storm/topology/IRichSpout.html): this is the 
interface that spouts must implement. 
+* [IRichSpout](javadocs/backtype/storm/topology/IRichSpout.html): this is the 
interface that spouts must implement.
 * [Guaranteeing message processing](Guaranteeing-message-processing.html)
 
 ### Bolts
@@ -61,7 +61,7 @@ Bolts can do simple stream transformations. Doing complex 
stream transformations
 
 Bolts can emit more than one stream. To do so, declare multiple streams using 
the `declareStream` method of 
[OutputFieldsDeclarer](javadocs/backtype/storm/topology/OutputFieldsDeclarer.html)
 and specify the stream to emit to when using the `emit` method on 
[OutputCollector](javadocs/backtype/storm/task/OutputCollector.html).
 
-When you declare a bolt's input streams, you always subscribe to specific 
streams of another component. If you want to subscribe to all the streams of 
another component, you have to subscribe to each one individually. 
[InputDeclarer](javadocs/backtype/storm/topology/InputDeclarer.html) has 
syntactic sugar for subscribing to streams declared on the default stream id. 
Saying `declarer.shuffleGrouping("1")` subscribes to the default stream on 
component "1" and is equivalent to `declarer.shuffleGrouping("1", 
DEFAULT_STREAM_ID)`. 
+When you declare a bolt's input streams, you always subscribe to specific 
streams of another component. If you want to subscribe to all the streams of 
another component, you have to subscribe to each one individually. 
[InputDeclarer](javadocs/backtype/storm/topology/InputDeclarer.html) has 
syntactic sugar for subscribing to streams declared on the default stream id. 
Saying `declarer.shuffleGrouping("1")` subscribes to the default stream on 
component "1" and is equivalent to `declarer.shuffleGrouping("1", 
DEFAULT_STREAM_ID)`.
 
 The main method in bolts is the `execute` method which takes in as input a new 
tuple. Bolts emit new tuples using the 
[OutputCollector](javadocs/backtype/storm/task/OutputCollector.html) object. 
Bolts must call the `ack` method on the `OutputCollector` for every tuple they 
process so that Storm knows when tuples are completed (and can eventually 
determine that its safe to ack the original spout tuples). For the common case 
of processing an input tuple, emitting 0 or more tuples based on that tuple, 
and then acking the input tuple, Storm provides an 
[IBasicBolt](javadocs/backtype/storm/topology/IBasicBolt.html) interface which 
does the acking automatically.
 
@@ -78,21 +78,21 @@ Its perfectly fine to launch new threads in bolts that do 
processing asynchronou
 
 Part of defining a topology is specifying for each bolt which streams it 
should receive as input. A stream grouping defines how that stream should be 
partitioned among the bolt's tasks.
 
-There are seven built-in stream groupings in Storm, and you can implement a 
custom stream grouping by implementing the 
[CustomStreamGrouping](javadocs/backtype/storm/grouping/CustomStreamGrouping.html)
 interface:
+There are eight built-in stream groupings in Storm, and you can implement a 
custom stream grouping by implementing the 
[CustomStreamGrouping](javadocs/backtype/storm/grouping/CustomStreamGrouping.html)
 interface:
 
 1. **Shuffle grouping**: Tuples are randomly distributed across the bolt's 
tasks in a way such that each bolt is guaranteed to get an equal number of 
tuples.
 2. **Fields grouping**: The stream is partitioned by the fields specified in 
the grouping. For example, if the stream is grouped by the "user-id" field, 
tuples with the same "user-id" will always go to the same task, but tuples with 
different "user-id"'s may go to different tasks.
-3. **All grouping**: The stream is replicated across all the bolt's tasks. Use 
this grouping with care.
-4. **Global grouping**: The entire stream goes to a single one of the bolt's 
tasks. Specifically, it goes to the task with the lowest id.
-5. **None grouping**: This grouping specifies that you don't care how the 
stream is grouped. Currently, none groupings are equivalent to shuffle 
groupings. Eventually though, Storm will push down bolts with none groupings to 
execute in the same thread as the bolt or spout they subscribe from (when 
possible).
-6. **Direct grouping**: This is a special kind of grouping. A stream grouped 
this way means that the __producer__ of the tuple decides which task of the 
consumer will receive this tuple. Direct groupings can only be declared on 
streams that have been declared as direct streams. Tuples emitted to a direct 
stream must be emitted using one of the 
[emitDirect](javadocs/backtype/storm/task/OutputCollector.html#emitDirect(int, 
int, java.util.List) methods. A bolt can get the task ids of its consumers by 
either using the provided 
[TopologyContext](javadocs/backtype/storm/task/TopologyContext.html) or by 
keeping track of the output of the `emit` method in 
[OutputCollector](javadocs/backtype/storm/task/OutputCollector.html) (which 
returns the task ids that the tuple was sent to).  
-7. **Local or shuffle grouping**: If the target bolt has one or more tasks in 
the same worker process, tuples will be shuffled to just those in-process 
tasks. Otherwise, this acts like a normal shuffle grouping.
+3. **Partial Key grouping**: The stream is partitioned by the fields specified 
in the grouping, like the Fields grouping, but are load balanced between two 
downstream bolts, which provides better utilization of resources when the 
incoming data is skewed. [This 
paper](https://melmeric.files.wordpress.com/2014/11/the-power-of-both-choices-practical-load-balancing-for-distributed-stream-processing-engines.pdf)
 provides a good explanation of how it works and the advantages it provides.
+4. **All grouping**: The stream is replicated across all the bolt's tasks. Use 
this grouping with care.
+5. **Global grouping**: The entire stream goes to a single one of the bolt's 
tasks. Specifically, it goes to the task with the lowest id.
+6. **None grouping**: This grouping specifies that you don't care how the 
stream is grouped. Currently, none groupings are equivalent to shuffle 
groupings. Eventually though, Storm will push down bolts with none groupings to 
execute in the same thread as the bolt or spout they subscribe from (when 
possible).
+7. **Direct grouping**: This is a special kind of grouping. A stream grouped 
this way means that the __producer__ of the tuple decides which task of the 
consumer will receive this tuple. Direct groupings can only be declared on 
streams that have been declared as direct streams. Tuples emitted to a direct 
stream must be emitted using one of the 
[emitDirect](javadocs/backtype/storm/task/OutputCollector.html#emitDirect(int, 
int, java.util.List) methods. A bolt can get the task ids of its consumers by 
either using the provided 
[TopologyContext](javadocs/backtype/storm/task/TopologyContext.html) or by 
keeping track of the output of the `emit` method in 
[OutputCollector](javadocs/backtype/storm/task/OutputCollector.html) (which 
returns the task ids that the tuple was sent to).
+8. **Local or shuffle grouping**: If the target bolt has one or more tasks in 
the same worker process, tuples will be shuffled to just those in-process 
tasks. Otherwise, this acts like a normal shuffle grouping.
 
 **Resources:**
 
 * [TopologyBuilder](javadocs/backtype/storm/topology/TopologyBuilder.html): 
use this class to define topologies
 * [InputDeclarer](javadocs/backtype/storm/topology/InputDeclarer.html): this 
object is returned whenever `setBolt` is called on `TopologyBuilder` and is 
used for declaring a bolt's input streams and how those streams should be 
grouped
-* [CoordinatedBolt](javadocs/backtype/storm/task/CoordinatedBolt.html): this 
bolt is useful for distributed RPC topologies and makes heavy use of direct 
streams and direct groupings
 
 ### Reliability
 
@@ -104,7 +104,7 @@ This is all explained in much more detail in [Guaranteeing 
message processing](G
 
 ### Tasks
 
-Each spout or bolt executes as many tasks across the cluster. Each task 
corresponds to one thread of execution, and stream groupings define how to send 
tuples from one set of tasks to another set of tasks. You set the parallelism 
for each spout or bolt in the `setSpout` and `setBolt` methods of 
[TopologyBuilder](javadocs/backtype/storm/topology/TopologyBuilder.html). 
+Each spout or bolt executes as many tasks across the cluster. Each task 
corresponds to one thread of execution, and stream groupings define how to send 
tuples from one set of tasks to another set of tasks. You set the parallelism 
for each spout or bolt in the `setSpout` and `setBolt` methods of 
[TopologyBuilder](javadocs/backtype/storm/topology/TopologyBuilder.html).
 
 ### Workers
 

http://git-wip-us.apache.org/repos/asf/storm/blob/97f0558e/docs/Configuration.md
----------------------------------------------------------------------
diff --git a/docs/Configuration.md b/docs/Configuration.md
index 8e8ca77..83f4ef7 100644
--- a/docs/Configuration.md
+++ b/docs/Configuration.md
@@ -1,9 +1,11 @@
 ---
+title: Configuration
 layout: documentation
+documentation: true
 ---
 Storm has a variety of configurations for tweaking the behavior of nimbus, 
supervisors, and running topologies. Some configurations are system 
configurations and cannot be modified on a topology by topology basis, whereas 
other configurations can be modified per topology. 
 
-Every configuration has a default value defined in 
[defaults.yaml](https://github.com/apache/incubator-storm/blob/master/conf/defaults.yaml)
 in the Storm codebase. You can override these configurations by defining a 
storm.yaml in the classpath of Nimbus and the supervisors. Finally, you can 
define a topology-specific configuration that you submit along with your 
topology when using 
[StormSubmitter](javadocs/backtype/storm/StormSubmitter.html). However, the 
topology-specific configuration can only override configs prefixed with 
"TOPOLOGY".
+Every configuration has a default value defined in 
[defaults.yaml]({{page.git-blob-base}}/conf/defaults.yaml) in the Storm 
codebase. You can override these configurations by defining a storm.yaml in the 
classpath of Nimbus and the supervisors. Finally, you can define a 
topology-specific configuration that you submit along with your topology when 
using [StormSubmitter](javadocs/backtype/storm/StormSubmitter.html). However, 
the topology-specific configuration can only override configs prefixed with 
"TOPOLOGY".
 
 Storm 0.7.0 and onwards lets you override configuration on a 
per-bolt/per-spout basis. The only configurations that can be overriden this 
way are:
 
@@ -23,7 +25,7 @@ The preference order for configuration values is 
defaults.yaml < storm.yaml < to
 **Resources:**
 
 * [Config](javadocs/backtype/storm/Config.html): a listing of all 
configurations as well as a helper class for creating topology specific 
configurations
-* 
[defaults.yaml](https://github.com/apache/incubator-storm/blob/master/conf/defaults.yaml):
 the default values for all configurations
+* [defaults.yaml]({{page.git-blob-base}}/conf/defaults.yaml): the default 
values for all configurations
 * [Setting up a Storm cluster](Setting-up-a-Storm-cluster.html): explains how 
to create and configure a Storm cluster
 * [Running topologies on a production 
cluster](Running-topologies-on-a-production-cluster.html): lists useful 
configurations when running topologies on a cluster
 * [Local mode](Local-mode.html): lists useful configurations when using local 
mode

http://git-wip-us.apache.org/repos/asf/storm/blob/97f0558e/docs/Contributing-to-Storm.md
----------------------------------------------------------------------
diff --git a/docs/Contributing-to-Storm.md b/docs/Contributing-to-Storm.md
index dff23fb..fdc5835 100644
--- a/docs/Contributing-to-Storm.md
+++ b/docs/Contributing-to-Storm.md
@@ -1,5 +1,7 @@
 ---
+title: Contributing
 layout: documentation
+documentation: true
 ---
 
 ### Getting started with contributing
@@ -12,7 +14,7 @@ The [Implementation docs](Implementation-docs.html) section 
of the wiki gives de
 
 ### Contribution process
 
-Contributions to the Storm codebase should be sent as GitHub pull requests. If 
there's any problems to the pull request we can iterate on it using GitHub's 
commenting features.
+Contributions to the Storm codebase should be sent as 
[GitHub](https://github.com/apache/storm) pull requests. If there's any 
problems to the pull request we can iterate on it using GitHub's commenting 
features.
 
 For small patches, feel free to submit pull requests directly for them. For 
larger contributions, please use the following process. The idea behind this 
process is to prevent any wasted work and catch design issues early on:
 

http://git-wip-us.apache.org/repos/asf/storm/blob/97f0558e/docs/Creating-a-new-Storm-project.md
----------------------------------------------------------------------
diff --git a/docs/Creating-a-new-Storm-project.md 
b/docs/Creating-a-new-Storm-project.md
index feb49b8..35ab1eb 100644
--- a/docs/Creating-a-new-Storm-project.md
+++ b/docs/Creating-a-new-Storm-project.md
@@ -1,18 +1,18 @@
 ---
+title: Creating a New Storm Project
 layout: documentation
+documentation: true
 ---
 This page outlines how to set up a Storm project for development. The steps 
are:
 
 1. Add Storm jars to classpath
 2. If using multilang, add multilang dir to classpath
 
-Follow along to see how to set up the 
[storm-starter](http://github.com/nathanmarz/storm-starter) project in Eclipse.
+Follow along to see how to set up the 
[storm-starter]({{page.git-blob-base}}/examples/storm-starter) project in 
Eclipse.
 
 ### Add Storm jars to classpath
 
-You'll need the Storm jars on your classpath to develop Storm topologies. 
Using [Maven](Maven.html) is highly recommended. [Here's an 
example](https://github.com/nathanmarz/storm-starter/blob/master/m2-pom.xml) of 
how to setup your pom.xml for a Storm project. If you don't want to use Maven, 
you can include the jars from the Storm release on your classpath. 
-
-[storm-starter](http://github.com/nathanmarz/storm-starter) uses 
[Leiningen](http://github.com/technomancy/leiningen) for build and dependency 
resolution. You can install leiningen by downloading [this 
script](https://raw.github.com/technomancy/leiningen/stable/bin/lein), placing 
it on your path, and making it executable. To retrieve the dependencies for 
Storm, simply run `lein deps` in the project root.
+You'll need the Storm jars on your classpath to develop Storm topologies. 
Using [Maven](Maven.html) is highly recommended. [Here's an 
example]({{page.git-blob-base}}/examples/storm-starter/pom.xml) of how to setup 
your pom.xml for a Storm project. If you don't want to use Maven, you can 
include the jars from the Storm release on your classpath.
 
 To set up the classpath in Eclipse, create a new Java project, include 
`src/jvm/` as a source path, and make sure all the jars in `lib/` and 
`lib/dev/` are in the `Referenced Libraries` section of the project.
 

http://git-wip-us.apache.org/repos/asf/storm/blob/97f0558e/docs/DSLs-and-multilang-adapters.md
----------------------------------------------------------------------
diff --git a/docs/DSLs-and-multilang-adapters.md 
b/docs/DSLs-and-multilang-adapters.md
index 31bd453..0ed5450 100644
--- a/docs/DSLs-and-multilang-adapters.md
+++ b/docs/DSLs-and-multilang-adapters.md
@@ -1,9 +1,10 @@
 ---
+title: Storm DSLs and Multi-Lang Adapters
 layout: documentation
+documentation: true
 ---
 * [Scala DSL](https://github.com/velvia/ScalaStorm)
 * [JRuby DSL](https://github.com/colinsurprenant/redstorm)
 * [Clojure DSL](Clojure-DSL.html)
 * [Storm/Esper integration](https://github.com/tomdz/storm-esper): Streaming 
SQL on top of Storm
-* [io-storm](https://github.com/gphat/io-storm): Perl multilang adapter
-* [storm-php](https://github.com/lazyshot/storm-php): PHP multilang adapter
+* [io-storm](https://github.com/dan-blanchard/io-storm): Perl multilang adapter

http://git-wip-us.apache.org/repos/asf/storm/blob/97f0558e/docs/Daemon-Fault-Tolerance.md
----------------------------------------------------------------------
diff --git a/docs/Daemon-Fault-Tolerance.md b/docs/Daemon-Fault-Tolerance.md
new file mode 100644
index 0000000..1af681a
--- /dev/null
+++ b/docs/Daemon-Fault-Tolerance.md
@@ -0,0 +1,30 @@
+---
+title: Daemon Fault Tolerance
+layout: documentation
+documentation: true
+---
+Storm has several different daemon processes.  Nimbus that schedules workers, 
supervisors that launch and kill workers, the log viewer that gives access to 
logs, and the UI that shows the status of a cluster.
+
+## What happens when a worker dies?
+
+When a worker dies, the supervisor will restart it. If it continuously fails 
on startup and is unable to heartbeat to Nimbus, Nimbus will reschedule the 
worker.
+
+## What happens when a node dies?
+
+The tasks assigned to that machine will time-out and Nimbus will reassign 
those tasks to other machines.
+
+## What happens when Nimbus or Supervisor daemons die?
+
+The Nimbus and Supervisor daemons are designed to be fail-fast (process 
self-destructs whenever any unexpected situation is encountered) and stateless 
(all state is kept in Zookeeper or on disk). As described in [Setting up a 
Storm cluster](Setting-up-a-Storm-cluster.html), the Nimbus and Supervisor 
daemons must be run under supervision using a tool like daemontools or monit. 
So if the Nimbus or Supervisor daemons die, they restart like nothing happened.
+
+Most notably, no worker processes are affected by the death of Nimbus or the 
Supervisors. This is in contrast to Hadoop, where if the JobTracker dies, all 
the running jobs are lost. 
+
+## Is Nimbus a single point of failure?
+
+If you lose the Nimbus node, the workers will still continue to function. 
Additionally, supervisors will continue to restart workers if they die. 
However, without Nimbus, workers won't be reassigned to other machines when 
necessary (like if you lose a worker machine). 
+
+So the answer is that Nimbus is "sort of" a SPOF. In practice, it's not a big 
deal since nothing catastrophic happens when the Nimbus daemon dies. There are 
plans to make Nimbus highly available in the future.
+
+## How does Storm guarantee data processing?
+
+Storm provides mechanisms to guarantee data processing even if nodes die or 
messages are lost. See [Guaranteeing message 
processing](Guaranteeing-message-processing.html) for the details.

http://git-wip-us.apache.org/repos/asf/storm/blob/97f0558e/docs/Defining-a-non-jvm-language-dsl-for-storm.md
----------------------------------------------------------------------
diff --git a/docs/Defining-a-non-jvm-language-dsl-for-storm.md 
b/docs/Defining-a-non-jvm-language-dsl-for-storm.md
index f52f4ab..7096a43 100644
--- a/docs/Defining-a-non-jvm-language-dsl-for-storm.md
+++ b/docs/Defining-a-non-jvm-language-dsl-for-storm.md
@@ -1,7 +1,9 @@
 ---
+title: Defining a Non-JVM DSL for Storm
 layout: documentation
+documentation: true
 ---
-The right place to start to learn how to make a non-JVM DSL for Storm is 
[storm-core/src/storm.thrift](https://github.com/apache/incubator-storm/blob/master/storm-core/src/storm.thrift).
 Since Storm topologies are just Thrift structures, and Nimbus is a Thrift 
daemon, you can create and submit topologies in any language.
+The right place to start to learn how to make a non-JVM DSL for Storm is 
[storm-core/src/storm.thrift]({{page.git-blob-base}}/storm-core/src/storm.thrift).
 Since Storm topologies are just Thrift structures, and Nimbus is a Thrift 
daemon, you can create and submit topologies in any language.
 
 When you create the Thrift structs for spouts and bolts, the code for the 
spout or bolt is specified in the ComponentObject struct:
 

http://git-wip-us.apache.org/repos/asf/storm/blob/97f0558e/docs/Distributed-RPC.md
----------------------------------------------------------------------
diff --git a/docs/Distributed-RPC.md b/docs/Distributed-RPC.md
index fc75ee4..4af2702 100644
--- a/docs/Distributed-RPC.md
+++ b/docs/Distributed-RPC.md
@@ -1,5 +1,7 @@
 ---
+title: Distributed RPC
 layout: documentation
+documentation: true
 ---
 The idea behind distributed RPC (DRPC) is to parallelize the computation of 
really intense functions on the fly using Storm. The Storm topology takes in as 
input a stream of function arguments, and it emits an output stream of the 
results for each of those function calls. 
 
@@ -116,7 +118,7 @@ The reach of a URL is the number of unique people exposed 
to a URL on Twitter. T
 
 A single reach computation can involve thousands of database calls and tens of 
millions of follower records during the computation. It's a really, really 
intense computation. As you're about to see, implementing this function on top 
of Storm is dead simple. On a single machine, reach can take minutes to 
compute; on a Storm cluster, you can compute reach for even the hardest URLs in 
a couple seconds.
 
-A sample reach topology is defined in storm-starter 
[here](https://github.com/nathanmarz/storm-starter/blob/master/src/jvm/storm/starter/ReachTopology.java).
 Here's how you define the reach topology:
+A sample reach topology is defined in storm-starter 
[here]({{page.git-blob-base}}/examples/storm-starter/src/jvm/storm/starter/ReachTopology.java).
 Here's how you define the reach topology:
 
 ```java
 LinearDRPCTopologyBuilder builder = new LinearDRPCTopologyBuilder("reach");

http://git-wip-us.apache.org/repos/asf/storm/blob/97f0558e/docs/FAQ.md
----------------------------------------------------------------------
diff --git a/docs/FAQ.md b/docs/FAQ.md
index 8ff7a6f..127c95c 100644
--- a/docs/FAQ.md
+++ b/docs/FAQ.md
@@ -1,5 +1,7 @@
 ---
+title: FAQ
 layout: documentation
+documentation: true
 ---
 
 ## Best Practices
@@ -26,7 +28,7 @@ layout: documentation
 
 ### Halp! I cannot see:
 
-* **my logs** Logs by default go to $STORM_HOME/logs. Check that you have 
write permissions to that directory. They are configured in the 
logback/cluster.xml (0.9) and log4j/*.properties in earlier versions.
+* **my logs** Logs by default go to $STORM_HOME/logs. Check that you have 
write permissions to that directory. They are configured in log4j2/{cluster, 
worker}.xml.
 * **final JVM settings** Add the `-XX+PrintFlagsFinal` commandline option in 
the childopts (see the conf file)
 * **final Java system properties** Add `Properties props = 
System.getProperties(); props.list(System.out);` near where you build your 
topology.
 
@@ -60,6 +62,10 @@ You can join streams with join, merge or multiReduce.
 
 At time of writing, you can't emit to multiple output streams from Trident -- 
see [STORM-68](https://issues.apache.org/jira/browse/STORM-68)
 
+### Why am I getting a NotSerializableException/IllegalStateException when my 
topology is being started up?
+
+Within the Storm lifecycle, the topology is instantiated and then serialized 
to byte format to be stored in ZooKeeper, prior to the topology being executed. 
Within this step, if a spout or bolt within the topology has an initialized 
unserializable property, serialization will fail. If there is a need for a 
field that is unserializable, initialize it within the bolt or spout's prepare 
method, which is run after the topology is delivered to the worker.
+
 ## Spouts
 
 ### What is a coordinator, and why are there several?
@@ -72,11 +78,11 @@ You should only store static data, and as little of it as 
possible, into the met
 
 ### How often is the 'emitPartitionBatchNew' function called?
 
-Since the MBC is the actual spout, all the tuples in a batch are just members 
of its tupletree. That means storm's "max spout pending" config effectively 
defines the number of concurrent batches trident runs. The MBC emits a new 
batch if it has fewer than max-spending tuples pending and if at least one 
[trident batch 
interval](https://github.com/apache/incubator-storm/blob/master/conf/defaults.yaml#L115)'s
 worth of seconds has passed since the last batch.
+Since the MBC is the actual spout, all the tuples in a batch are just members 
of its tupletree. That means storm's "max spout pending" config effectively 
defines the number of concurrent batches trident runs. The MBC emits a new 
batch if it has fewer than max-spending tuples pending and if at least one 
[trident batch interval]({{page.git-blob-base}}/conf/defaults.yaml#L115)'s 
worth of seconds has passed since the last batch.
 
 ### If nothing was emitted does Trident slow down the calls?
 
-Yes, there's a pluggable "spout wait strategy"; the default is to sleep for a 
[configurable amount of 
time](https://github.com/apache/incubator-storm/blob/master/conf/defaults.yaml#L110)
+Yes, there's a pluggable "spout wait strategy"; the default is to sleep for a 
[configurable amount of time]({{page.git-blob-base}}/conf/defaults.yaml#L110)
 
 ### OK, then what is the trident batch interval for?
 
@@ -107,7 +113,7 @@ You can't change the overall batch size once generated, but 
you can change the n
 
 ### How do I aggregate events by time?
 
-If have records with an immutable timestamp, and you would like to count, 
average or otherwise aggregate them into discrete time buckets, Trident is an 
excellent and scalable solution.
+If you have records with an immutable timestamp, and you would like to count, 
average or otherwise aggregate them into discrete time buckets, Trident is an 
excellent and scalable solution.
 
 Write an `Each` function that turns the timestamp into a time bucket: if the 
bucket size was "by hour", then the timestamp `2013-08-08 12:34:56` would be 
mapped to the `2013-08-08 12:00:00` time bucket, and so would everything else 
in the twelve o'clock hour. Then group on that timebucket and use a grouped 
persistentAggregate. The persistentAggregate uses a local cacheMap backed by a 
data store. Groups with many records require very few reads from the data 
store, and use efficient bulk reads and writes; as long as your data feed is 
relatively prompt Trident will make very efficient use of memory and network. 
Even if a server drops off line for a day, then delivers that full day's worth 
of data in a rush, the old results will be calmly retrieved and updated -- and 
without interfering with calculating the current results.
 

http://git-wip-us.apache.org/repos/asf/storm/blob/97f0558e/docs/Guaranteeing-message-processing.md
----------------------------------------------------------------------
diff --git a/docs/Guaranteeing-message-processing.md 
b/docs/Guaranteeing-message-processing.md
index 91d4384..932ff0d 100644
--- a/docs/Guaranteeing-message-processing.md
+++ b/docs/Guaranteeing-message-processing.md
@@ -1,7 +1,10 @@
 ---
+title: Guaranteeing Message Processing
 layout: documentation
+documentation: true
 ---
-Storm guarantees that each message coming off a spout will be fully processed. 
This page describes how Storm accomplishes this guarantee and what you have to 
do as a user to benefit from Storm's reliability capabilities.
+Storm offers several different levels of guaranteed message processing, 
includeing best effort, at least once, and exactly once through 
[Trident](Trident-tutorial.html). 
+This page describes how Storm can guarantee at least once processing.
 
 ### What does it mean for a message to be "fully processed"?
 
@@ -129,12 +132,11 @@ In contrast, bolts that do aggregations or joins may 
delay acking a tuple until
 
 ### How do I make my applications work correctly given that tuples can be 
replayed?
 
-As always in software design, the answer is "it depends." Storm 0.7.0 
introduced the "transactional topologies" feature, which enables you to get 
fully fault-tolerant exactly-once messaging semantics for most computations. 
Read more about transactional topologies [here](Transactional-topologies.html). 
-
+As always in software design, the answer is "it depends." If you really want 
exactly once semantics use the [Trident](Trident-tutorial.html) API. In some 
cases, like with a lot of analytics, dropping data is OK so disabling the fault 
tolerance by setting the number of acker bolts to 0 
[Config.TOPOLOGY_ACKERS](javadocs/backtype/storm/Config.html#TOPOLOGY_ACKERS).  
But in some cases you want to be sure that everything was processed at least 
once and nothing was dropped.  This is especially useful if all operations are 
idenpotent or if deduping can happen aferwards.
 
 ### How does Storm implement reliability in an efficient way?
 
-A Storm topology has a set of special "acker" tasks that track the DAG of 
tuples for every spout tuple. When an acker sees that a DAG is complete, it 
sends a message to the spout task that created the spout tuple to ack the 
message. You can set the number of acker tasks for a topology in the topology 
configuration using 
[Config.TOPOLOGY_ACKERS](javadocs/backtype/storm/Config.html#TOPOLOGY_ACKERS). 
Storm defaults TOPOLOGY_ACKERS to one task -- you will need to increase this 
number for topologies processing large amounts of messages. 
+A Storm topology has a set of special "acker" tasks that track the DAG of 
tuples for every spout tuple. When an acker sees that a DAG is complete, it 
sends a message to the spout task that created the spout tuple to ack the 
message. You can set the number of acker tasks for a topology in the topology 
configuration using 
[Config.TOPOLOGY_ACKERS](javadocs/backtype/storm/Config.html#TOPOLOGY_ACKERS). 
Storm defaults TOPOLOGY_ACKERS to one task per worker.
 
 The best way to understand Storm's reliability implementation is to look at 
the lifecycle of tuples and tuple DAGs. When a tuple is created in a topology, 
whether in a spout or a bolt, it is given a random 64 bit id. These ids are 
used by ackers to track the tuple DAG for every spout tuple.
 

http://git-wip-us.apache.org/repos/asf/storm/blob/97f0558e/docs/Hooks.md
----------------------------------------------------------------------
diff --git a/docs/Hooks.md b/docs/Hooks.md
index bbe87a9..01cfa92 100644
--- a/docs/Hooks.md
+++ b/docs/Hooks.md
@@ -1,7 +1,9 @@
 ---
+title: Hooks
 layout: documentation
+documentation: true
 ---
 Storm provides hooks with which you can insert custom code to run on any 
number of events within Storm. You create a hook by extending the 
[BaseTaskHook](javadocs/backtype/storm/hooks/BaseTaskHook.html) class and 
overriding the appropriate method for the event you want to catch. There are 
two ways to register your hook:
 
-1. In the open method of your spout or prepare method of your bolt using the 
[TopologyContext#addTaskHook](javadocs/backtype/storm/task/TopologyContext.html)
 method.
+1. In the open method of your spout or prepare method of your bolt using the 
[TopologyContext](javadocs/backtype/storm/task/TopologyContext.html#addTaskHook)
 method.
 2. Through the Storm configuration using the 
["topology.auto.task.hooks"](javadocs/backtype/storm/Config.html#TOPOLOGY_AUTO_TASK_HOOKS)
 config. These hooks are automatically registered in every spout or bolt, and 
are useful for doing things like integrating with a custom monitoring system.

http://git-wip-us.apache.org/repos/asf/storm/blob/97f0558e/docs/Implementation-docs.md
----------------------------------------------------------------------
diff --git a/docs/Implementation-docs.md b/docs/Implementation-docs.md
index f01083a..15f088e 100644
--- a/docs/Implementation-docs.md
+++ b/docs/Implementation-docs.md
@@ -1,12 +1,13 @@
 ---
+title: Storm Internal Implementation
 layout: documentation
+documentation: true
 ---
 This section of the wiki is dedicated to explaining how Storm is implemented. 
You should have a good grasp of how to use Storm before reading these sections. 
 
 - [Structure of the codebase](Structure-of-the-codebase.html)
 - [Lifecycle of a topology](Lifecycle-of-a-topology.html)
 - [Message passing implementation](Message-passing-implementation.html)
-- [Acking framework implementation](Acking-framework-implementation.html)
 - [Metrics](Metrics.html)
 - How transactional topologies work
    - subtopology for TransactionalSpout

http://git-wip-us.apache.org/repos/asf/storm/blob/97f0558e/docs/Kestrel-and-Storm.md
----------------------------------------------------------------------
diff --git a/docs/Kestrel-and-Storm.md b/docs/Kestrel-and-Storm.md
index e16b0d9..cb80139 100644
--- a/docs/Kestrel-and-Storm.md
+++ b/docs/Kestrel-and-Storm.md
@@ -1,11 +1,13 @@
 ---
+title: Storm and Kestrel
 layout: documentation
+documentation: true
 ---
 This page explains how to use to Storm to consume items from a Kestrel cluster.
 
 ## Preliminaries
 ### Storm
-This tutorial uses examples from the 
[storm-kestrel](https://github.com/nathanmarz/storm-kestrel) project and the 
[storm-starter](https://github.com/nathanmarz/storm-starter) project. It's 
recommended that you clone those projects and follow along with the examples. 
Read [Setting up development 
environment](https://github.com/apache/incubator-storm/wiki/Setting-up-development-environment)
 and [Creating a new Storm 
project](https://github.com/apache/incubator-storm/wiki/Creating-a-new-Storm-project)
 to get your machine set up.
+This tutorial uses examples from the 
[storm-kestrel](https://github.com/nathanmarz/storm-kestrel) project and the 
[storm-starter](http://github.com/apache/storm/blob/{{page.version}}/examples/storm-starter)
 project. It's recommended that you clone those projects and follow along with 
the examples. Read [Setting up development 
environment](Setting-up-development-environment.html) and [Creating a new Storm 
project](Creating-a-new-Storm-project.html) to get your machine set up.
 ### Kestrel
 It assumes you are able to run locally a Kestrel server as described 
[here](https://github.com/nathanmarz/storm-kestrel).
 

http://git-wip-us.apache.org/repos/asf/storm/blob/97f0558e/docs/Lifecycle-of-a-topology.md
----------------------------------------------------------------------
diff --git a/docs/Lifecycle-of-a-topology.md b/docs/Lifecycle-of-a-topology.md
index 4919be8..6436206 100644
--- a/docs/Lifecycle-of-a-topology.md
+++ b/docs/Lifecycle-of-a-topology.md
@@ -1,5 +1,7 @@
 ---
+title: Lifecycle of a Storm Topology
 layout: documentation
+documentation: true
 ---
 (**NOTE**: this page is based on the 0.7.1 code; many things have changed 
since then, including a split between tasks and executors, and a reorganization 
of the code under `storm-core/src` rather than `src/`.)
 
@@ -7,74 +9,74 @@ This page explains in detail the lifecycle of a topology from 
running the "storm
 
 First a couple of important notes about topologies:
 
-1. The actual topology that runs is different than the topology the user 
specifies. The actual topology has implicit streams and an implicit "acker" 
bolt added to manage the acking framework (used to guarantee data processing). 
The implicit topology is created via the 
[system-topology!](https://github.com/apache/incubator-storm/blob/0.7.1/src/clj/backtype/storm/daemon/common.clj#L188)
 function.
+1. The actual topology that runs is different than the topology the user 
specifies. The actual topology has implicit streams and an implicit "acker" 
bolt added to manage the acking framework (used to guarantee data processing). 
The implicit topology is created via the 
[system-topology!](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/common.clj#L188)
 function.
 2. `system-topology!` is used in two places:
-  - when Nimbus is creating tasks for the topology 
[code](https://github.com/apache/incubator-storm/blob/0.7.1/src/clj/backtype/storm/daemon/nimbus.clj#L316)
-  - in the worker so it knows where it needs to route messages to 
[code](https://github.com/apache/incubator-storm/blob/0.7.1/src/clj/backtype/storm/daemon/worker.clj#L90)
+  - when Nimbus is creating tasks for the topology 
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/nimbus.clj#L316)
+  - in the worker so it knows where it needs to route messages to 
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/worker.clj#L90)
 
 ## Starting a topology
 
-- "storm jar" command executes your class with the specified arguments. The 
only special thing that "storm jar" does is set the "storm.jar" environment 
variable for use by `StormSubmitter` later. 
[code](https://github.com/apache/incubator-storm/blob/0.7.1/bin/storm#L101)
+- "storm jar" command executes your class with the specified arguments. The 
only special thing that "storm jar" does is set the "storm.jar" environment 
variable for use by `StormSubmitter` later. 
[code](https://github.com/apache/storm/blob/0.7.1/bin/storm#L101)
 - When your code uses `StormSubmitter.submitTopology`, `StormSubmitter` takes 
the following actions:
-  - First, `StormSubmitter` uploads the jar if it hasn't been uploaded before. 
[code](https://github.com/apache/incubator-storm/blob/0.7.1/src/jvm/backtype/storm/StormSubmitter.java#L83)
-    - Jar uploading is done via Nimbus's Thrift interface 
[code](https://github.com/apache/incubator-storm/blob/0.7.1/src/storm.thrift#L200)
+  - First, `StormSubmitter` uploads the jar if it hasn't been uploaded before. 
[code](https://github.com/apache/storm/blob/0.7.1/src/jvm/backtype/storm/StormSubmitter.java#L83)
+    - Jar uploading is done via Nimbus's Thrift interface 
[code](https://github.com/apache/storm/blob/0.7.1/src/storm.thrift#L200)
     - `beginFileUpload` returns a path in Nimbus's inbox
     - 15 kilobytes are uploaded at a time through `uploadChunk`
     - `finishFileUpload` is called when it's finished uploading
-    - Here is Nimbus's implementation of those Thrift methods: 
[code](https://github.com/apache/incubator-storm/blob/0.7.1/src/clj/backtype/storm/daemon/nimbus.clj#L694)
-  - Second, `StormSubmitter` calls `submitTopology` on the Nimbus thrift 
interface 
[code](https://github.com/apache/incubator-storm/blob/0.7.1/src/jvm/backtype/storm/StormSubmitter.java#L60)
+    - Here is Nimbus's implementation of those Thrift methods: 
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/nimbus.clj#L694)
+  - Second, `StormSubmitter` calls `submitTopology` on the Nimbus thrift 
interface 
[code](https://github.com/apache/storm/blob/0.7.1/src/jvm/backtype/storm/StormSubmitter.java#L60)
     - The topology config is serialized using JSON (JSON is used so that 
writing DSL's in any language is as easy as possible)
     - Notice that the Thrift `submitTopology` call takes in the Nimbus inbox 
path where the jar was uploaded
 
-- Nimbus receives the topology submission. 
[code](https://github.com/apache/incubator-storm/blob/0.7.1/src/clj/backtype/storm/daemon/nimbus.clj#L639)
-- Nimbus normalizes the topology configuration. The main purpose of 
normalization is to ensure that every single task will have the same 
serialization registrations, which is critical for getting serialization 
working correctly. 
[code](https://github.com/apache/incubator-storm/blob/0.7.1/src/clj/backtype/storm/daemon/nimbus.clj#L557)
 
-- Nimbus sets up the static state for the topology 
[code](https://github.com/apache/incubator-storm/blob/0.7.1/src/clj/backtype/storm/daemon/nimbus.clj#L661)
+- Nimbus receives the topology submission. 
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/nimbus.clj#L639)
+- Nimbus normalizes the topology configuration. The main purpose of 
normalization is to ensure that every single task will have the same 
serialization registrations, which is critical for getting serialization 
working correctly. 
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/nimbus.clj#L557)
+- Nimbus sets up the static state for the topology 
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/nimbus.clj#L661)
     - Jars and configs are kept on local filesystem because they're too big 
for Zookeeper. The jar and configs are copied into the path {nimbus local 
dir}/stormdist/{topology id}
     - `setup-storm-static` writes task -> component mapping into ZK
     - `setup-heartbeats` creates a ZK "directory" in which tasks can heartbeat
-- Nimbus calls `mk-assignment` to assign tasks to machines 
[code](https://github.com/apache/incubator-storm/blob/0.7.1/src/clj/backtype/storm/daemon/nimbus.clj#L458)
 
-    - Assignment record definition is here: 
[code](https://github.com/apache/incubator-storm/blob/0.7.1/src/clj/backtype/storm/daemon/common.clj#L25)
+- Nimbus calls `mk-assignment` to assign tasks to machines 
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/nimbus.clj#L458)
+    - Assignment record definition is here: 
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/common.clj#L25)
     - Assignment contains:
       - `master-code-dir`: used by supervisors to download the correct 
jars/configs for the topology from Nimbus
       - `task->node+port`: Map from a task id to the worker that task should 
be running on. (A worker is identified by a node/port pair)
       - `node->host`: A map from node id to hostname. This is used so workers 
know which machines to connect to to communicate with other workers. Node ids 
are used to identify supervisors so that multiple supervisors can be run on one 
machine. One place this is done is with Mesos integration.
       - `task->start-time-secs`: Contains a map from task id to the timestamp 
at which Nimbus launched that task. This is used by Nimbus when monitoring 
topologies, as tasks are given a longer timeout to heartbeat when they're first 
launched (the launch timeout is configured by "nimbus.task.launch.secs" config)
-- Once topologies are assigned, they're initially in a deactivated mode. 
`start-storm` writes data into Zookeeper so that the cluster knows the topology 
is active and can start emitting tuples from spouts. 
[code](https://github.com/apache/incubator-storm/blob/0.7.1/src/clj/backtype/storm/daemon/nimbus.clj#L504)
+- Once topologies are assigned, they're initially in a deactivated mode. 
`start-storm` writes data into Zookeeper so that the cluster knows the topology 
is active and can start emitting tuples from spouts. 
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/nimbus.clj#L504)
 
 - TODO cluster state diagram (show all nodes and what's kept everywhere)
 
 - Supervisor runs two functions in the background:
-    - `synchronize-supervisor`: This is called whenever assignments in 
Zookeeper change and also every 10 seconds. 
[code](https://github.com/apache/incubator-storm/blob/0.7.1/src/clj/backtype/storm/daemon/supervisor.clj#L241)
-      - Downloads code from Nimbus for topologies assigned to this machine for 
which it doesn't have the code yet. 
[code](https://github.com/apache/incubator-storm/blob/0.7.1/src/clj/backtype/storm/daemon/supervisor.clj#L258)
-      - Writes into local filesystem what this node is supposed to be running. 
It writes a map from port -> LocalAssignment. LocalAssignment contains a 
topology id as well as the list of task ids for that worker. 
[code](https://github.com/apache/incubator-storm/blob/0.7.1/src/clj/backtype/storm/daemon/supervisor.clj#L13)
-    - `sync-processes`: Reads from the LFS what `synchronize-supervisor` wrote 
and compares that to what's actually running on the machine. It then 
starts/stops worker processes as necessary to synchronize. 
[code](https://github.com/apache/incubator-storm/blob/0.7.1/src/clj/backtype/storm/daemon/supervisor.clj#L177)
+    - `synchronize-supervisor`: This is called whenever assignments in 
Zookeeper change and also every 10 seconds. 
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/supervisor.clj#L241)
+      - Downloads code from Nimbus for topologies assigned to this machine for 
which it doesn't have the code yet. 
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/supervisor.clj#L258)
+      - Writes into local filesystem what this node is supposed to be running. 
It writes a map from port -> LocalAssignment. LocalAssignment contains a 
topology id as well as the list of task ids for that worker. 
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/supervisor.clj#L13)
+    - `sync-processes`: Reads from the LFS what `synchronize-supervisor` wrote 
and compares that to what's actually running on the machine. It then 
starts/stops worker processes as necessary to synchronize. 
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/supervisor.clj#L177)
     
-- Worker processes start up through the `mk-worker` function 
[code](https://github.com/apache/incubator-storm/blob/0.7.1/src/clj/backtype/storm/daemon/worker.clj#L67)
-  - Worker connects to other workers and starts a thread to monitor for 
changes. So if a worker gets reassigned, the worker will automatically 
reconnect to the other worker's new location. 
[code](https://github.com/apache/incubator-storm/blob/0.7.1/src/clj/backtype/storm/daemon/worker.clj#L123)
-  - Monitors whether a topology is active or not and stores that state in the 
`storm-active-atom` variable. This variable is used by tasks to determine 
whether or not to call `nextTuple` on the spouts. 
[code](https://github.com/apache/incubator-storm/blob/0.7.1/src/clj/backtype/storm/daemon/worker.clj#L155)
-  - The worker launches the actual tasks as threads within it 
[code](https://github.com/apache/incubator-storm/blob/0.7.1/src/clj/backtype/storm/daemon/worker.clj#L178)
-- Tasks are set up through the `mk-task` function 
[code](https://github.com/apache/incubator-storm/blob/0.7.1/src/clj/backtype/storm/daemon/task.clj#L160)
-  - Tasks set up routing function which takes in a stream and an output tuple 
and returns a list of task ids to send the tuple to 
[code](https://github.com/apache/incubator-storm/blob/0.7.1/src/clj/backtype/storm/daemon/task.clj#L207)
 (there's also a 3-arity version used for direct streams)
-  - Tasks set up the spout-specific or bolt-specific code with 
[code](https://github.com/apache/incubator-storm/blob/0.7.1/src/clj/backtype/storm/daemon/task.clj#L241)
+- Worker processes start up through the `mk-worker` function 
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/worker.clj#L67)
+  - Worker connects to other workers and starts a thread to monitor for 
changes. So if a worker gets reassigned, the worker will automatically 
reconnect to the other worker's new location. 
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/worker.clj#L123)
+  - Monitors whether a topology is active or not and stores that state in the 
`storm-active-atom` variable. This variable is used by tasks to determine 
whether or not to call `nextTuple` on the spouts. 
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/worker.clj#L155)
+  - The worker launches the actual tasks as threads within it 
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/worker.clj#L178)
+- Tasks are set up through the `mk-task` function 
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/task.clj#L160)
+  - Tasks set up routing function which takes in a stream and an output tuple 
and returns a list of task ids to send the tuple to 
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/task.clj#L207)
 (there's also a 3-arity version used for direct streams)
+  - Tasks set up the spout-specific or bolt-specific code with 
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/task.clj#L241)
    
 ## Topology Monitoring
 
 - Nimbus monitors the topology during its lifetime
-   - Schedules recurring task on the timer thread to check the topologies 
[code](https://github.com/apache/incubator-storm/blob/0.7.1/src/clj/backtype/storm/daemon/nimbus.clj#L623)
-   - Nimbus's behavior is represented as a finite state machine 
[code](https://github.com/apache/incubator-storm/blob/0.7.1/src/clj/backtype/storm/daemon/nimbus.clj#L98)
-   - The "monitor" event is called on a topology every 
"nimbus.monitor.freq.secs", which calls `reassign-topology` through 
`reassign-transition` 
[code](https://github.com/apache/incubator-storm/blob/0.7.1/src/clj/backtype/storm/daemon/nimbus.clj#L497)
+   - Schedules recurring task on the timer thread to check the topologies 
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/nimbus.clj#L623)
+   - Nimbus's behavior is represented as a finite state machine 
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/nimbus.clj#L98)
+   - The "monitor" event is called on a topology every 
"nimbus.monitor.freq.secs", which calls `reassign-topology` through 
`reassign-transition` 
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/nimbus.clj#L497)
    - `reassign-topology` calls `mk-assignments`, the same function used to 
assign the topology the first time. `mk-assignments` is also capable of 
incrementally updating a topology
       - `mk-assignments` checks heartbeats and reassigns workers as necessary
       - Any reassignments change the state in ZK, which will trigger 
supervisors to synchronize and start/stop workers
       
 ## Killing a topology
 
-- "storm kill" command runs this code which just calls the Nimbus Thrift 
interface to kill the topology: 
[code](https://github.com/apache/incubator-storm/blob/0.7.1/src/clj/backtype/storm/command/kill_topology.clj)
-- Nimbus receives the kill command 
[code](https://github.com/apache/incubator-storm/blob/0.7.1/src/clj/backtype/storm/daemon/nimbus.clj#L671)
-- Nimbus applies the "kill" transition to the topology 
[code](https://github.com/apache/incubator-storm/blob/0.7.1/src/clj/backtype/storm/daemon/nimbus.clj#L676)
-- The kill transition function changes the status of the topology to "killed" 
and schedules the "remove" event to run "wait time seconds" in the future. 
[code](https://github.com/apache/incubator-storm/blob/0.7.1/src/clj/backtype/storm/daemon/nimbus.clj#L63)
+- "storm kill" command runs this code which just calls the Nimbus Thrift 
interface to kill the topology: 
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/command/kill_topology.clj)
+- Nimbus receives the kill command 
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/nimbus.clj#L671)
+- Nimbus applies the "kill" transition to the topology 
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/nimbus.clj#L676)
+- The kill transition function changes the status of the topology to "killed" 
and schedules the "remove" event to run "wait time seconds" in the future. 
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/nimbus.clj#L63)
    - The wait time defaults to the topology message timeout but can be 
overridden with the -w flag in the "storm kill" command
    - This causes the topology to be deactivated for the wait time before its 
actually shut down. This gives the topology a chance to finish processing what 
it's currently processing before shutting down the workers
-   - Changing the status during the kill transition ensures that the kill 
protocol is fault-tolerant to Nimbus crashing. On startup, if the status of the 
topology is "killed", Nimbus schedules the remove event to run "wait time 
seconds" in the future 
[code](https://github.com/apache/incubator-storm/blob/0.7.1/src/clj/backtype/storm/daemon/nimbus.clj#L111)
-- Removing a topology cleans out the assignment and static information from ZK 
[code](https://github.com/apache/incubator-storm/blob/0.7.1/src/clj/backtype/storm/daemon/nimbus.clj#L116)
-- A separate cleanup thread runs the `do-cleanup` function which will clean up 
the heartbeat dir and the jars/configs stored locally. 
[code](https://github.com/apache/incubator-storm/blob/0.7.1/src/clj/backtype/storm/daemon/nimbus.clj#L577)
+   - Changing the status during the kill transition ensures that the kill 
protocol is fault-tolerant to Nimbus crashing. On startup, if the status of the 
topology is "killed", Nimbus schedules the remove event to run "wait time 
seconds" in the future 
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/nimbus.clj#L111)
+- Removing a topology cleans out the assignment and static information from ZK 
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/nimbus.clj#L116)
+- A separate cleanup thread runs the `do-cleanup` function which will clean up 
the heartbeat dir and the jars/configs stored locally. 
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/nimbus.clj#L577)

http://git-wip-us.apache.org/repos/asf/storm/blob/97f0558e/docs/Local-mode.md
----------------------------------------------------------------------
diff --git a/docs/Local-mode.md b/docs/Local-mode.md
index 1f98e36..f871a73 100644
--- a/docs/Local-mode.md
+++ b/docs/Local-mode.md
@@ -1,5 +1,7 @@
 ---
+title: Local Mode
 layout: documentation
+documentation: true
 ---
 Local mode simulates a Storm cluster in process and is useful for developing 
and testing topologies. Running topologies in local mode is similar to running 
topologies [on a cluster](Running-topologies-on-a-production-cluster.html). 
 

http://git-wip-us.apache.org/repos/asf/storm/blob/97f0558e/docs/Maven.md
----------------------------------------------------------------------
diff --git a/docs/Maven.md b/docs/Maven.md
index 85828da..0c09c2c 100644
--- a/docs/Maven.md
+++ b/docs/Maven.md
@@ -1,56 +1,22 @@
 ---
+title: Maven
 layout: documentation
+documentation: true
 ---
-To develop topologies, you'll need the Storm jars on your classpath. You 
should either include the unpacked jars in the classpath for your project or 
use Maven to include Storm as a development dependency. Storm is hosted on 
Clojars (a Maven repository). To include Storm in your project as a development 
dependency, add the following to your pom.xml:
+To develop topologies, you'll need the Storm jars on your classpath. You 
should either include the unpacked jars in the classpath for your project or 
use Maven to include Storm as a development dependency. Storm is hosted on 
Maven Central. To include Storm in your project as a development dependency, 
add the following to your pom.xml:
 
-```xml
-<repository>
-  <id>clojars.org</id>
-  <url>http://clojars.org/repo</url>
-</repository>
-```
 
 ```xml
 <dependency>
-  <groupId>storm</groupId>
-  <artifactId>storm</artifactId>
-  <version>0.7.2</version>
-  <scope>test</scope>
+  <groupId>org.apache.storm</groupId>
+  <artifactId>storm-core</artifactId>
+  <version>{{page.version}}</version>
+  <scope>provided</scope>
 </dependency>
 ```
 
-[Here's an 
example](https://github.com/nathanmarz/storm-starter/blob/master/m2-pom.xml) of 
a pom.xml for a Storm project.
-
-If Maven isn't your thing, check out 
[leiningen](https://github.com/technomancy/leiningen). Leiningen is a build 
tool for Clojure, but it can be used for pure Java projects as well. Leiningen 
makes builds and dependency management using Maven dead-simple. Here's an 
example project.clj for a pure-Java Storm project:
-
-```clojure
-(defproject storm-starter "0.0.1-SNAPSHOT"
-  :java-source-path "src/jvm"
-  :javac-options {:debug "true" :fork "true"}
-  :jvm-opts ["-Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib"]
-  :dependencies []
-  :dev-dependencies [
-                     [storm "0.7.2"]
-                     ])
-```
-
-You can fetch dependencies using `lein deps`, build the project with `lein 
compile`, and make a jar suitable for submitting to a cluster with `lein 
uberjar`.
-
-### Using Storm as a library
-
-If you want to use Storm as a library (e.g., use the Distributed RPC client) 
and have the Storm dependency jars be distributed with your application, 
there's a separate Maven dependency called "storm/storm-lib". The only 
difference between this dependency and the usual "storm/storm" is that 
storm-lib does not have any logging configured.
+[Here's an example]({{page.git-blob-base}}/examples/storm-starter/pom.xml) of 
a pom.xml for a Storm project.
 
 ### Developing Storm
 
-You will want to
-
-       bash ./bin/install_zmq.sh   # install the jzmq dependency
-       lein sub install
-
-Build javadocs with
-
-       bash ./bin/javadoc.sh
-
-### Building a Storm Release
-
-Use the file `bin/build_release.sh` to make a zipfile like the ones you would 
download (and like what the bin files require in order to run daemons).
+Please refer to [DEVELOPER.md]({{page.git-blob-base}}/DEVELOPER.md) for more 
details.

http://git-wip-us.apache.org/repos/asf/storm/blob/97f0558e/docs/Message-passing-implementation.md
----------------------------------------------------------------------
diff --git a/docs/Message-passing-implementation.md 
b/docs/Message-passing-implementation.md
index f22a5aa..a17f66a 100644
--- a/docs/Message-passing-implementation.md
+++ b/docs/Message-passing-implementation.md
@@ -1,28 +1,30 @@
 ---
+title: Message Passing Implementation
 layout: documentation
+documentation: true
 ---
 (Note: this walkthrough is out of date as of 0.8.0. 0.8.0 revamped the message 
passing infrastructure to be based on the Disruptor)
 
 This page walks through how emitting and transferring tuples works in Storm.
 
 - Worker is responsible for message transfer
-   - `refresh-connections` is called every "task.refresh.poll.secs" or 
whenever assignment in ZK changes. It manages connections to other workers and 
maintains a mapping from task -> worker 
[code](https://github.com/apache/incubator-storm/blob/0.7.1/src/clj/backtype/storm/daemon/worker.clj#L123)
-   - Provides a "transfer function" that is used by tasks to send tuples to 
other tasks. The transfer function takes in a task id and a tuple, and it 
serializes the tuple and puts it onto a "transfer queue". There is a single 
transfer queue for each worker. 
[code](https://github.com/apache/incubator-storm/blob/0.7.1/src/clj/backtype/storm/daemon/worker.clj#L56)
-   - The serializer is thread-safe 
[code](https://github.com/apache/incubator-storm/blob/0.7.1/src/jvm/backtype/storm/serialization/KryoTupleSerializer.java#L26)
-   - The worker has a single thread which drains the transfer queue and sends 
the messages to other workers 
[code](https://github.com/apache/incubator-storm/blob/0.7.1/src/clj/backtype/storm/daemon/worker.clj#L185)
-   - Message sending happens through this protocol: 
[code](https://github.com/apache/incubator-storm/blob/0.7.1/src/clj/backtype/storm/messaging/protocol.clj)
-   - The implementation for distributed mode uses ZeroMQ 
[code](https://github.com/apache/incubator-storm/blob/0.7.1/src/clj/backtype/storm/messaging/zmq.clj)
-   - The implementation for local mode uses in memory Java queues (so that 
it's easy to use Storm locally without needing to get ZeroMQ installed) 
[code](https://github.com/apache/incubator-storm/blob/0.7.1/src/clj/backtype/storm/messaging/local.clj)
+   - `refresh-connections` is called every "task.refresh.poll.secs" or 
whenever assignment in ZK changes. It manages connections to other workers and 
maintains a mapping from task -> worker 
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/worker.clj#L123)
+   - Provides a "transfer function" that is used by tasks to send tuples to 
other tasks. The transfer function takes in a task id and a tuple, and it 
serializes the tuple and puts it onto a "transfer queue". There is a single 
transfer queue for each worker. 
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/worker.clj#L56)
+   - The serializer is thread-safe 
[code](https://github.com/apache/storm/blob/0.7.1/src/jvm/backtype/storm/serialization/KryoTupleSerializer.java#L26)
+   - The worker has a single thread which drains the transfer queue and sends 
the messages to other workers 
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/worker.clj#L185)
+   - Message sending happens through this protocol: 
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/messaging/protocol.clj)
+   - The implementation for distributed mode uses ZeroMQ 
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/messaging/zmq.clj)
+   - The implementation for local mode uses in memory Java queues (so that 
it's easy to use Storm locally without needing to get ZeroMQ installed) 
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/messaging/local.clj)
 - Receiving messages in tasks works differently in local mode and distributed 
mode
-   - In local mode, the tuple is sent directly to an in-memory queue for the 
receiving task 
[code](https://github.com/apache/incubator-storm/blob/master/src/clj/backtype/storm/messaging/local.clj#L21)
-   - In distributed mode, each worker listens on a single TCP port for 
incoming messages and then routes those messages in-memory to tasks. The TCP 
port is called a "virtual port", because it receives [task id, message] and 
then routes it to the actual task. 
[code](https://github.com/apache/incubator-storm/blob/master/src/clj/backtype/storm/daemon/worker.clj#L204)
-      - The virtual port implementation is here: 
[code](https://github.com/apache/incubator-storm/blob/master/src/clj/zilch/virtual_port.clj)
-      - Tasks listen on an in-memory ZeroMQ port for messages from the virtual 
port 
[code](https://github.com/apache/incubator-storm/blob/master/src/clj/backtype/storm/daemon/task.clj#L201)
-        - Bolts listen here: 
[code](https://github.com/apache/incubator-storm/blob/master/src/clj/backtype/storm/daemon/task.clj#L489)
-        - Spouts listen here: 
[code](https://github.com/apache/incubator-storm/blob/master/src/clj/backtype/storm/daemon/task.clj#L382)
+   - In local mode, the tuple is sent directly to an in-memory queue for the 
receiving task 
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/messaging/local.clj#L21)
+   - In distributed mode, each worker listens on a single TCP port for 
incoming messages and then routes those messages in-memory to tasks. The TCP 
port is called a "virtual port", because it receives [task id, message] and 
then routes it to the actual task. 
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/worker.clj#L204)
+      - The virtual port implementation is here: 
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/zilch/virtual_port.clj)
+      - Tasks listen on an in-memory ZeroMQ port for messages from the virtual 
port 
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/task.clj#L201)
+        - Bolts listen here: 
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/task.clj#L489)
+        - Spouts listen here: 
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/task.clj#L382)
 - Tasks are responsible for message routing. A tuple is emitted either to a 
direct stream (where the task id is specified) or a regular stream. In direct 
streams, the message is only sent if that bolt subscribes to that direct 
stream. In regular streams, the stream grouping functions are used to determine 
the task ids to send the tuple to.
-  - Tasks have a routing map from {stream id} -> {component id} -> {stream 
grouping function} 
[code](https://github.com/apache/incubator-storm/blob/master/src/clj/backtype/storm/daemon/task.clj#L198)
-  - The "tasks-fn" returns the task ids to send the tuples to for either 
regular stream emit or direct stream emit 
[code](https://github.com/apache/incubator-storm/blob/master/src/clj/backtype/storm/daemon/task.clj#L207)
+  - Tasks have a routing map from {stream id} -> {component id} -> {stream 
grouping function} 
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/task.clj#L198)
+  - The "tasks-fn" returns the task ids to send the tuples to for either 
regular stream emit or direct stream emit 
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/task.clj#L207)
   - After getting the output task ids, bolts and spouts use the transfer-fn 
provided by the worker to actually transfer the tuples
-      - Bolt transfer code here: 
[code](https://github.com/apache/incubator-storm/blob/master/src/clj/backtype/storm/daemon/task.clj#L429)
-      - Spout transfer code here: 
[code](https://github.com/apache/incubator-storm/blob/master/src/clj/backtype/storm/daemon/task.clj#L329)
+      - Bolt transfer code here: 
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/task.clj#L429)
+      - Spout transfer code here: 
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/task.clj#L329)

http://git-wip-us.apache.org/repos/asf/storm/blob/97f0558e/docs/Metrics.md
----------------------------------------------------------------------
diff --git a/docs/Metrics.md b/docs/Metrics.md
index f43f8c7..b2521b1 100644
--- a/docs/Metrics.md
+++ b/docs/Metrics.md
@@ -1,5 +1,7 @@
 ---
+title: Storm Metrics
 layout: documentation
+documentation: true
 ---
 Storm exposes a metrics interface to report summary statistics across the full 
topology.
 It's used internally to track the numbers you see in the Nimbus UI console: 
counts of executes and acks; average process latency per bolt; worker heap 
usage; and so forth.
@@ -10,13 +12,13 @@ Metrics have to implement just one method, 
`getValueAndReset` -- do any remainin
 
 Storm gives you these metric types:
 
-* [AssignableMetric]() -- set the metric to the explicit value you supply. 
Useful if it's an external value or in the case that you are already 
calculating the summary statistic yourself.
-* 
[CombinedMetric](https://github.com/apache/incubator-storm/blob/master/storm-core/src/jvm/backtype/storm/metric/api/CombinedMetric.java)
 -- generic interface for metrics that can be updated associatively. 
-* 
[CountMetric](https://github.com/apache/incubator-storm/blob/master/storm-core/src/jvm/backtype/storm/metric/api/CountMetric.java)
 -- a running total of the supplied values. Call `incr()` to increment by one, 
`incrBy(n)` to add/subtract the given number.
-  - 
[MultiCountMetric](https://github.com/apache/incubator-storm/blob/master/storm-core/src/jvm/backtype/storm/metric/api/MultiCountMetric.java)
 -- a hashmap of count metrics.
-* 
[ReducedMetric](https://github.com/apache/incubator-storm/blob/master/storm-core/src/jvm/backtype/storm/metric/api/ReducedMetric.java)
-  - 
[MeanReducer](https://github.com/apache/incubator-storm/blob/master/storm-core/src/jvm/backtype/storm/metric/api/MeanReducer.java)
 -- track a running average of values given to its `reduce()` method. (It 
accepts `Double`, `Integer` or `Long` values, and maintains the internal 
average as a `Double`.) Despite his reputation, the MeanReducer is actually a 
pretty nice guy in person.
-  - 
[MultiReducedMetric](https://github.com/apache/incubator-storm/blob/master/storm-core/src/jvm/backtype/storm/metric/api/MultiReducedMetric.java)
 -- a hashmap of reduced metrics.
+* 
[AssignableMetric]({{page.git-blob-base}}/storm-core/src/jvm/backtype/storm/metric/api/AssignableMetric.java)
 -- set the metric to the explicit value you supply. Useful if it's an external 
value or in the case that you are already calculating the summary statistic 
yourself.
+* 
[CombinedMetric]({{page.git-blob-base}}/storm-core/src/jvm/backtype/storm/metric/api/CombinedMetric.java)
 -- generic interface for metrics that can be updated associatively. 
+* 
[CountMetric]({{page.git-blob-base}}/storm-core/src/jvm/backtype/storm/metric/api/CountMetric.java)
 -- a running total of the supplied values. Call `incr()` to increment by one, 
`incrBy(n)` to add/subtract the given number.
+  - 
[MultiCountMetric]({{page.git-blob-base}}/storm-core/src/jvm/backtype/storm/metric/api/MultiCountMetric.java)
 -- a hashmap of count metrics.
+* 
[ReducedMetric]({{page.git-blob-base}}/storm-core/src/jvm/backtype/storm/metric/api/ReducedMetric.java)
+  - 
[MeanReducer]({{page.git-blob-base}}/storm-core/src/jvm/backtype/storm/metric/api/MeanReducer.java)
 -- track a running average of values given to its `reduce()` method. (It 
accepts `Double`, `Integer` or `Long` values, and maintains the internal 
average as a `Double`.) Despite his reputation, the MeanReducer is actually a 
pretty nice guy in person.
+  - 
[MultiReducedMetric]({{page.git-blob-base}}/storm-core/src/jvm/backtype/storm/metric/api/MultiReducedMetric.java)
 -- a hashmap of reduced metrics.
 
 
 ### Metric Consumer
@@ -28,7 +30,7 @@ Storm gives you these metric types:
 
 ### Builtin Metrics
 
-The [builtin 
metrics](https://github.com/apache/incubator-storm/blob/46c3ba7/storm-core/src/clj/backtype/storm/daemon/builtin_metrics.clj)
 instrument Storm itself.
+The [builtin 
metrics]({{page.git-blob-base}}/storm-core/src/clj/backtype/storm/daemon/builtin_metrics.clj)
 instrument Storm itself.
 
-[builtin_metrics.clj](https://github.com/apache/incubator-storm/blob/46c3ba7/storm-core/src/clj/backtype/storm/daemon/builtin_metrics.clj)
 sets up data structures for the built-in metrics, and facade methods that the 
other framework components can use to update them. The metrics themselves are 
calculated in the calling code -- see for example 
[`ack-spout-msg`](https://github.com/apache/incubator-storm/blob/46c3ba7/storm-core/src/clj/backtype/storm/daemon/executor.clj#358)
  in `clj/b/s/daemon/daemon/executor.clj`
+[builtin_metrics.clj]({{page.git-blob-base}}/storm-core/src/clj/backtype/storm/daemon/builtin_metrics.clj)
 sets up data structures for the built-in metrics, and facade methods that the 
other framework components can use to update them. The metrics themselves are 
calculated in the calling code -- see for example 
[`ack-spout-msg`]({{page.git-blob-base}}/storm-core/src/clj/backtype/storm/daemon/executor.clj#358)
  in `clj/b/s/daemon/daemon/executor.clj`
 

http://git-wip-us.apache.org/repos/asf/storm/blob/97f0558e/docs/Multilang-protocol.md
----------------------------------------------------------------------
diff --git a/docs/Multilang-protocol.md b/docs/Multilang-protocol.md
index a3cb22c..2a90059 100644
--- a/docs/Multilang-protocol.md
+++ b/docs/Multilang-protocol.md
@@ -1,5 +1,7 @@
 ---
+title: Multi-Lang Protocol
 layout: documentation
+documentation: true
 ---
 This page explains the multilang protocol as of Storm 0.7.1. Versions prior to 
0.7.1 used a somewhat different protocol, documented 
[here](Storm-multi-language-protocol-(versions-0.7.0-and-below\).html).
 
@@ -47,7 +49,7 @@ STDIN and STDOUT.
 
 The initial handshake is the same for both types of shell components:
 
-* STDIN: Setup info. This is a JSON object with the Storm configuration, 
Topology context, and a PID directory, like this:
+* STDIN: Setup info. This is a JSON object with the Storm configuration, a PID 
directory, and a topology context, like this:
 
 ```
 {
@@ -55,15 +57,37 @@ The initial handshake is the same for both types of shell 
components:
         "topology.message.timeout.secs": 3,
         // etc
     },
+    "pidDir": "...",
     "context": {
         "task->component": {
             "1": "example-spout",
             "2": "__acker",
-            "3": "example-bolt"
+            "3": "example-bolt1",
+            "4": "example-bolt2"
         },
-        "taskid": 3
-    },
-    "pidDir": "..."
+        "taskid": 3,
+        // Everything below this line is only available in Storm 0.10.0+
+        "componentid": "example-bolt"
+        "stream->target->grouping": {
+               "default": {
+                       "example-bolt2": {
+                               "type": "SHUFFLE"}}},
+        "streams": ["default"],
+               "stream->outputfields": {"default": ["word"]},
+           "source->stream->grouping": {
+               "example-spout": {
+                       "default": {
+                               "type": "FIELDS",
+                               "fields": ["word"]
+                       }
+               }
+           }
+           "source->stream->fields": {
+               "example-spout": {
+                       "default": ["word"]
+               }
+           }
+       }
 }
 ```
 
@@ -71,6 +95,15 @@ Your script should create an empty file named with its PID 
in this directory. e.
 the PID is 1234, so an empty file named 1234 is created in the directory. This
 file lets the supervisor know the PID so it can shutdown the process later on.
 
+As of Storm 0.10.0, the context sent by Storm to shell components has been
+enhanced substantially to include all aspects of the topology context available
+to JVM components.  One key addition is the ability to determine a shell
+component's source and targets (i.e., inputs and outputs) in the topology via
+the `stream->target->grouping` and `source->stream->grouping` dictionaries.  At
+the innermost level of these nested dictionaries, groupings are represented as
+a dictionary that minimally has a `type` key, but can also have a `fields` key
+to specify which fields are involved in a `FIELDS` grouping.
+
 * STDOUT: Your PID, in a JSON object, like `{"pid": 1234}`. The shell 
component will log the PID to its log.
 
 What happens next depends on the type of component:
@@ -219,3 +252,36 @@ A "log" will log a message in the worker log. It looks 
like:
 
 * Note that, as of version 0.7.1, there is no longer any need for a
   shell bolt to 'sync'.
+
+### Handling Heartbeats (0.9.3 and later)
+
+As of Storm 0.9.3, heartbeats have been between ShellSpout/ShellBolt and their
+multi-lang subprocesses to detect hanging/zombie subprocesses.  Any libraries
+for interfacing with Storm via multi-lang must take the following actions
+regarding hearbeats:
+
+#### Spout
+
+Shell spouts are synchronous, so subprocesses always send `sync` commands at 
the
+end of `next()`,  so you should not have to do much to support heartbeats for
+spouts.  That said, you must not let subprocesses sleep more than the worker
+timeout during `next()`.
+
+#### Bolt
+
+Shell bolts are asynchronous, so a ShellBolt will send heartbeat tuples to its
+subprocess periodically.  Heartbeat tuple looks like:
+
+```
+{
+       "id": "-6955786537413359385",
+       "comp": "1",
+       "stream": "__heartbeat",
+       // this shell bolt's system task id
+       "task": -1,
+       "tuple": []
+}
+```
+
+When subprocess receives heartbeat tuple, it must send a `sync` command back to
+ShellBolt.

http://git-wip-us.apache.org/repos/asf/storm/blob/97f0558e/docs/Rationale.md
----------------------------------------------------------------------
diff --git a/docs/Rationale.md b/docs/Rationale.md
index 214266e..45ff396 100644
--- a/docs/Rationale.md
+++ b/docs/Rationale.md
@@ -1,5 +1,7 @@
 ---
+title: Rationale
 layout: documentation
+documentation: true
 ---
 The past decade has seen a revolution in data processing. MapReduce, Hadoop, 
and related technologies have made it possible to store and process data at 
scales previously unthinkable. Unfortunately, these data processing 
technologies are not realtime systems, nor are they meant to be. There's no 
hack that will turn Hadoop into a realtime system; realtime data processing has 
a fundamentally different set of requirements than batch processing.

[15/31] storm git commit: STORM-1617: 0.10.x release docs

Reply via email to