Author: bobby
Date: Fri Mar 18 17:56:59 2016
New Revision: 1735653
URL: http://svn.apache.org/viewvc?rev=1735653&view=rev
Log:
Moved packages in 1.0 release to point to where they are
Modified:
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Clojure-DSL.md
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Command-line-client.md
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Common-patterns.md
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Concepts.md
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Configuration.md
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Distributed-RPC.md
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Guaranteeing-message-processing.md
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Hooks.md
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Kestrel-and-Storm.md
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Lifecycle-of-a-topology.md
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Local-mode.md
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Message-passing-implementation.md
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Metrics.md
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Resource_Aware_Scheduler_overview.md
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Running-topologies-on-a-production-cluster.md
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/SECURITY.md
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/STORM-UI-REST-API.md
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Serialization.md
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Structure-of-the-codebase.md
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Transactional-topologies.md
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Trident-API-Overview.md
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Trident-spouts.md
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Trident-state.md
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Troubleshooting.md
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Tutorial.md
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Understanding-the-parallelism-of-a-Storm-topology.md
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/distcache-blobstore.md
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/flux.md
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/nimbus-ha-design.md
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/storm-kafka.md
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/storm-sql.md
Modified:
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Clojure-DSL.md
URL:
http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Clojure-DSL.md?rev=1735653&r1=1735652&r2=1735653&view=diff
==============================================================================
--- storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Clojure-DSL.md
(original)
+++ storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Clojure-DSL.md
Fri Mar 18 17:56:59 2016
@@ -3,7 +3,7 @@ title: Clojure DSL
layout: documentation
documentation: true
---
-Storm comes with a Clojure DSL for defining spouts, bolts, and topologies. The
Clojure DSL has access to everything the Java API exposes, so if you're a
Clojure user you can code Storm topologies without touching Java at all. The
Clojure DSL is defined in the source in the
[backtype.storm.clojure]({{page.git-blob-base}}/storm-core/src/clj/backtype/storm/clojure.clj)
namespace.
+Storm comes with a Clojure DSL for defining spouts, bolts, and topologies. The
Clojure DSL has access to everything the Java API exposes, so if you're a
Clojure user you can code Storm topologies without touching Java at all. The
Clojure DSL is defined in the source in the
[org.apache.storm.clojure]({{page.git-blob-base}}/storm-core/src/clj/org/apache/storm/clojure.clj)
namespace.
This page outlines all the pieces of the Clojure DSL, including:
@@ -38,11 +38,11 @@ The maps of spout and bolt specs are map
#### spout-spec
-`spout-spec` takes as arguments the spout implementation (an object that
implements [IRichSpout](javadocs/backtype/storm/topology/IRichSpout.html)) and
optional keyword arguments. The only option that exists currently is the `:p`
option, which specifies the parallelism for the spout. If you omit `:p`, the
spout will execute as a single task.
+`spout-spec` takes as arguments the spout implementation (an object that
implements [IRichSpout](javadocs/org/apache/storm/topology/IRichSpout.html))
and optional keyword arguments. The only option that exists currently is the
`:p` option, which specifies the parallelism for the spout. If you omit `:p`,
the spout will execute as a single task.
#### bolt-spec
-`bolt-spec` takes as arguments the input declaration for the bolt, the bolt
implementation (an object that implements
[IRichBolt](javadocs/backtype/storm/topology/IRichBolt.html)), and optional
keyword arguments.
+`bolt-spec` takes as arguments the input declaration for the bolt, the bolt
implementation (an object that implements
[IRichBolt](javadocs/org/apache/storm/topology/IRichBolt.html)), and optional
keyword arguments.
The input declaration is a map from stream ids to stream groupings. A stream
id can have one of two forms:
@@ -254,7 +254,7 @@ The following example illustrates how to
That's all there is to the Clojure DSL. To submit topologies in remote mode or
local mode, just use the `StormSubmitter` or `LocalCluster` classes just like
you would from Java.
-To create topology configs, it's easiest to use the `backtype.storm.config`
namespace which defines constants for all of the possible configs. The
constants are the same as the static constants in the `Config` class, except
with dashes instead of underscores. For example, here's a topology config that
sets the number of workers to 15 and configures the topology in debug mode:
+To create topology configs, it's easiest to use the `org.apache.storm.config`
namespace which defines constants for all of the possible configs. The
constants are the same as the static constants in the `Config` class, except
with dashes instead of underscores. For example, here's a topology config that
sets the number of workers to 15 and configures the topology in debug mode:
```clojure
{TOPOLOGY-DEBUG true
Modified:
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Command-line-client.md
URL:
http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Command-line-client.md?rev=1735653&r1=1735652&r2=1735653&view=diff
==============================================================================
---
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Command-line-client.md
(original)
+++
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Command-line-client.md
Fri Mar 18 17:56:59 2016
@@ -25,7 +25,7 @@ These commands are:
Syntax: `storm jar topology-jar-path class ...`
-Runs the main method of `class` with the specified arguments. The storm jars
and configs in `~/.storm` are put on the classpath. The process is configured
so that [StormSubmitter](javadocs/backtype/storm/StormSubmitter.html) will
upload the jar at `topology-jar-path` when the topology is submitted.
+Runs the main method of `class` with the specified arguments. The storm jars
and configs in `~/.storm` are put on the classpath. The process is configured
so that [StormSubmitter](javadocs/org/apache/storm/StormSubmitter.html) will
upload the jar at `topology-jar-path` when the topology is submitted.
### kill
Modified:
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Common-patterns.md
URL:
http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Common-patterns.md?rev=1735653&r1=1735652&r2=1735653&view=diff
==============================================================================
---
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Common-patterns.md
(original)
+++
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Common-patterns.md
Fri Mar 18 17:56:59 2016
@@ -39,7 +39,7 @@ If you want reliability in your data pro
If the bolt emits tuples, then you may want to use multi-anchoring to ensure
reliability. It all depends on the specific application. See [Guaranteeing
message processing](Guaranteeing-message-processing.html) for more details on
how reliability works.
### BasicBolt
-Many bolts follow a similar pattern of reading an input tuple, emitting zero
or more tuples based on that input tuple, and then acking that input tuple
immediately at the end of the execute method. Bolts that match this pattern are
things like functions and filters. This is such a common pattern that Storm
exposes an interface called
[IBasicBolt](javadocs/backtype/storm/topology/IBasicBolt.html) that automates
this pattern for you. See [Guaranteeing message
processing](Guaranteeing-message-processing.html) for more information.
+Many bolts follow a similar pattern of reading an input tuple, emitting zero
or more tuples based on that input tuple, and then acking that input tuple
immediately at the end of the execute method. Bolts that match this pattern are
things like functions and filters. This is such a common pattern that Storm
exposes an interface called
[IBasicBolt](javadocs/org/apache/storm/topology/IBasicBolt.html) that automates
this pattern for you. See [Guaranteeing message
processing](Guaranteeing-message-processing.html) for more information.
### In-memory caching + fields grouping combo
@@ -87,11 +87,11 @@ The topology needs an extra layer of pro
### TimeCacheMap for efficiently keeping a cache of things that have been
recently updated
-You sometimes want to keep a cache in memory of items that have been recently
"active" and have items that have been inactive for some time be automatically
expires. [TimeCacheMap](javadocs/backtype/storm/utils/TimeCacheMap.html) is an
efficient data structure for doing this and provides hooks so you can insert
callbacks whenever an item is expired.
+You sometimes want to keep a cache in memory of items that have been recently
"active" and have items that have been inactive for some time be automatically
expires. [TimeCacheMap](javadocs/org/apache/storm/utils/TimeCacheMap.html) is
an efficient data structure for doing this and provides hooks so you can insert
callbacks whenever an item is expired.
### CoordinatedBolt and KeyedFairBolt for Distributed RPC
-When building distributed RPC applications on top of Storm, there are two
common patterns that are usually needed. These are encapsulated by
[CoordinatedBolt](javadocs/backtype/storm/task/CoordinatedBolt.html) and
[KeyedFairBolt](javadocs/backtype/storm/task/KeyedFairBolt.html) which are part
of the "standard library" that ships with the Storm codebase.
+When building distributed RPC applications on top of Storm, there are two
common patterns that are usually needed. These are encapsulated by
[CoordinatedBolt](javadocs/org/apache/storm/task/CoordinatedBolt.html) and
[KeyedFairBolt](javadocs/org/apache/storm/task/KeyedFairBolt.html) which are
part of the "standard library" that ships with the Storm codebase.
`CoordinatedBolt` wraps the bolt containing your logic and figures out when
your bolt has received all the tuples for any given request. It makes heavy use
of direct streams to do this.
Modified:
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Concepts.md
URL:
http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Concepts.md?rev=1735653&r1=1735652&r2=1735653&view=diff
==============================================================================
--- storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Concepts.md
(original)
+++ storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Concepts.md Fri
Mar 18 17:56:59 2016
@@ -21,7 +21,7 @@ The logic for a realtime application is
**Resources:**
-* [TopologyBuilder](javadocs/backtype/storm/topology/TopologyBuilder.html):
use this class to construct topologies in Java
+* [TopologyBuilder](javadocs/org/apache/storm/topology/TopologyBuilder.html):
use this class to construct topologies in Java
* [Running topologies on a production
cluster](Running-topologies-on-a-production-cluster.html)
* [Local mode](Local-mode.html): Read this to learn how to develop and test
topologies in local mode.
@@ -29,28 +29,28 @@ The logic for a realtime application is
The stream is the core abstraction in Storm. A stream is an unbounded sequence
of tuples that is processed and created in parallel in a distributed fashion.
Streams are defined with a schema that names the fields in the stream's tuples.
By default, tuples can contain integers, longs, shorts, bytes, strings,
doubles, floats, booleans, and byte arrays. You can also define your own
serializers so that custom types can be used natively within tuples.
-Every stream is given an id when declared. Since single-stream spouts and
bolts are so common,
[OutputFieldsDeclarer](javadocs/backtype/storm/topology/OutputFieldsDeclarer.html)
has convenience methods for declaring a single stream without specifying an
id. In this case, the stream is given the default id of "default".
+Every stream is given an id when declared. Since single-stream spouts and
bolts are so common,
[OutputFieldsDeclarer](javadocs/org/apache/storm/topology/OutputFieldsDeclarer.html)
has convenience methods for declaring a single stream without specifying an
id. In this case, the stream is given the default id of "default".
**Resources:**
-* [Tuple](javadocs/backtype/storm/tuple/Tuple.html): streams are composed of
tuples
-*
[OutputFieldsDeclarer](javadocs/backtype/storm/topology/OutputFieldsDeclarer.html):
used to declare streams and their schemas
+* [Tuple](javadocs/org/apache/storm/tuple/Tuple.html): streams are composed of
tuples
+*
[OutputFieldsDeclarer](javadocs/org/apache/storm/topology/OutputFieldsDeclarer.html):
used to declare streams and their schemas
* [Serialization](Serialization.html): Information about Storm's dynamic
typing of tuples and declaring custom serializations
### Spouts
A spout is a source of streams in a topology. Generally spouts will read
tuples from an external source and emit them into the topology (e.g. a Kestrel
queue or the Twitter API). Spouts can either be __reliable__ or __unreliable__.
A reliable spout is capable of replaying a tuple if it failed to be processed
by Storm, whereas an unreliable spout forgets about the tuple as soon as it is
emitted.
-Spouts can emit more than one stream. To do so, declare multiple streams using
the `declareStream` method of
[OutputFieldsDeclarer](javadocs/backtype/storm/topology/OutputFieldsDeclarer.html)
and specify the stream to emit to when using the `emit` method on
[SpoutOutputCollector](javadocs/backtype/storm/spout/SpoutOutputCollector.html).
+Spouts can emit more than one stream. To do so, declare multiple streams using
the `declareStream` method of
[OutputFieldsDeclarer](javadocs/org/apache/storm/topology/OutputFieldsDeclarer.html)
and specify the stream to emit to when using the `emit` method on
[SpoutOutputCollector](javadocs/org/apache/storm/spout/SpoutOutputCollector.html).
The main method on spouts is `nextTuple`. `nextTuple` either emits a new tuple
into the topology or simply returns if there are no new tuples to emit. It is
imperative that `nextTuple` does not block for any spout implementation,
because Storm calls all the spout methods on the same thread.
-The other main methods on spouts are `ack` and `fail`. These are called when
Storm detects that a tuple emitted from the spout either successfully completed
through the topology or failed to be completed. `ack` and `fail` are only
called for reliable spouts. See [the
Javadoc](javadocs/backtype/storm/spout/ISpout.html) for more information.
+The other main methods on spouts are `ack` and `fail`. These are called when
Storm detects that a tuple emitted from the spout either successfully completed
through the topology or failed to be completed. `ack` and `fail` are only
called for reliable spouts. See [the
Javadoc](javadocs/org/apache/storm/spout/ISpout.html) for more information.
**Resources:**
-* [IRichSpout](javadocs/backtype/storm/topology/IRichSpout.html): this is the
interface that spouts must implement.
+* [IRichSpout](javadocs/org/apache/storm/topology/IRichSpout.html): this is
the interface that spouts must implement.
* [Guaranteeing message processing](Guaranteeing-message-processing.html)
### Bolts
@@ -59,26 +59,26 @@ All processing in topologies is done in
Bolts can do simple stream transformations. Doing complex stream
transformations often requires multiple steps and thus multiple bolts. For
example, transforming a stream of tweets into a stream of trending images
requires at least two steps: a bolt to do a rolling count of retweets for each
image, and one or more bolts to stream out the top X images (you can do this
particular stream transformation in a more scalable way with three bolts than
with two).
-Bolts can emit more than one stream. To do so, declare multiple streams using
the `declareStream` method of
[OutputFieldsDeclarer](javadocs/backtype/storm/topology/OutputFieldsDeclarer.html)
and specify the stream to emit to when using the `emit` method on
[OutputCollector](javadocs/backtype/storm/task/OutputCollector.html).
+Bolts can emit more than one stream. To do so, declare multiple streams using
the `declareStream` method of
[OutputFieldsDeclarer](javadocs/org/apache/storm/topology/OutputFieldsDeclarer.html)
and specify the stream to emit to when using the `emit` method on
[OutputCollector](javadocs/org/apache/storm/task/OutputCollector.html).
-When you declare a bolt's input streams, you always subscribe to specific
streams of another component. If you want to subscribe to all the streams of
another component, you have to subscribe to each one individually.
[InputDeclarer](javadocs/backtype/storm/topology/InputDeclarer.html) has
syntactic sugar for subscribing to streams declared on the default stream id.
Saying `declarer.shuffleGrouping("1")` subscribes to the default stream on
component "1" and is equivalent to `declarer.shuffleGrouping("1",
DEFAULT_STREAM_ID)`.
+When you declare a bolt's input streams, you always subscribe to specific
streams of another component. If you want to subscribe to all the streams of
another component, you have to subscribe to each one individually.
[InputDeclarer](javadocs/org/apache/storm/topology/InputDeclarer.html) has
syntactic sugar for subscribing to streams declared on the default stream id.
Saying `declarer.shuffleGrouping("1")` subscribes to the default stream on
component "1" and is equivalent to `declarer.shuffleGrouping("1",
DEFAULT_STREAM_ID)`.
-The main method in bolts is the `execute` method which takes in as input a new
tuple. Bolts emit new tuples using the
[OutputCollector](javadocs/backtype/storm/task/OutputCollector.html) object.
Bolts must call the `ack` method on the `OutputCollector` for every tuple they
process so that Storm knows when tuples are completed (and can eventually
determine that its safe to ack the original spout tuples). For the common case
of processing an input tuple, emitting 0 or more tuples based on that tuple,
and then acking the input tuple, Storm provides an
[IBasicBolt](javadocs/backtype/storm/topology/IBasicBolt.html) interface which
does the acking automatically.
+The main method in bolts is the `execute` method which takes in as input a new
tuple. Bolts emit new tuples using the
[OutputCollector](javadocs/org/apache/storm/task/OutputCollector.html) object.
Bolts must call the `ack` method on the `OutputCollector` for every tuple they
process so that Storm knows when tuples are completed (and can eventually
determine that its safe to ack the original spout tuples). For the common case
of processing an input tuple, emitting 0 or more tuples based on that tuple,
and then acking the input tuple, Storm provides an
[IBasicBolt](javadocs/org/apache/storm/topology/IBasicBolt.html) interface
which does the acking automatically.
-Its perfectly fine to launch new threads in bolts that do processing
asynchronously.
[OutputCollector](javadocs/backtype/storm/task/OutputCollector.html) is
thread-safe and can be called at any time.
+Its perfectly fine to launch new threads in bolts that do processing
asynchronously.
[OutputCollector](javadocs/org/apache/storm/task/OutputCollector.html) is
thread-safe and can be called at any time.
**Resources:**
-* [IRichBolt](javadocs/backtype/storm/topology/IRichBolt.html): this is
general interface for bolts.
-* [IBasicBolt](javadocs/backtype/storm/topology/IBasicBolt.html): this is a
convenience interface for defining bolts that do filtering or simple functions.
-* [OutputCollector](javadocs/backtype/storm/task/OutputCollector.html): bolts
emit tuples to their output streams using an instance of this class
+* [IRichBolt](javadocs/org/apache/storm/topology/IRichBolt.html): this is
general interface for bolts.
+* [IBasicBolt](javadocs/org/apache/storm/topology/IBasicBolt.html): this is a
convenience interface for defining bolts that do filtering or simple functions.
+* [OutputCollector](javadocs/org/apache/storm/task/OutputCollector.html):
bolts emit tuples to their output streams using an instance of this class
* [Guaranteeing message processing](Guaranteeing-message-processing.html)
### Stream groupings
Part of defining a topology is specifying for each bolt which streams it
should receive as input. A stream grouping defines how that stream should be
partitioned among the bolt's tasks.
-There are eight built-in stream groupings in Storm, and you can implement a
custom stream grouping by implementing the
[CustomStreamGrouping](javadocs/backtype/storm/grouping/CustomStreamGrouping.html)
interface:
+There are eight built-in stream groupings in Storm, and you can implement a
custom stream grouping by implementing the
[CustomStreamGrouping](javadocs/org/apache/storm/grouping/CustomStreamGrouping.html)
interface:
1. **Shuffle grouping**: Tuples are randomly distributed across the bolt's
tasks in a way such that each bolt is guaranteed to get an equal number of
tuples.
2. **Fields grouping**: The stream is partitioned by the fields specified in
the grouping. For example, if the stream is grouped by the "user-id" field,
tuples with the same "user-id" will always go to the same task, but tuples with
different "user-id"'s may go to different tasks.
@@ -86,25 +86,25 @@ There are eight built-in stream grouping
4. **All grouping**: The stream is replicated across all the bolt's tasks. Use
this grouping with care.
5. **Global grouping**: The entire stream goes to a single one of the bolt's
tasks. Specifically, it goes to the task with the lowest id.
6. **None grouping**: This grouping specifies that you don't care how the
stream is grouped. Currently, none groupings are equivalent to shuffle
groupings. Eventually though, Storm will push down bolts with none groupings to
execute in the same thread as the bolt or spout they subscribe from (when
possible).
-7. **Direct grouping**: This is a special kind of grouping. A stream grouped
this way means that the __producer__ of the tuple decides which task of the
consumer will receive this tuple. Direct groupings can only be declared on
streams that have been declared as direct streams. Tuples emitted to a direct
stream must be emitted using one of the
[emitDirect](javadocs/backtype/storm/task/OutputCollector.html#emitDirect(int,
int, java.util.List) methods. A bolt can get the task ids of its consumers by
either using the provided
[TopologyContext](javadocs/backtype/storm/task/TopologyContext.html) or by
keeping track of the output of the `emit` method in
[OutputCollector](javadocs/backtype/storm/task/OutputCollector.html) (which
returns the task ids that the tuple was sent to).
+7. **Direct grouping**: This is a special kind of grouping. A stream grouped
this way means that the __producer__ of the tuple decides which task of the
consumer will receive this tuple. Direct groupings can only be declared on
streams that have been declared as direct streams. Tuples emitted to a direct
stream must be emitted using one of the
[emitDirect](javadocs/org/apache/storm/task/OutputCollector.html#emitDirect(int,
int, java.util.List) methods. A bolt can get the task ids of its consumers by
either using the provided
[TopologyContext](javadocs/org/apache/storm/task/TopologyContext.html) or by
keeping track of the output of the `emit` method in
[OutputCollector](javadocs/org/apache/storm/task/OutputCollector.html) (which
returns the task ids that the tuple was sent to).
8. **Local or shuffle grouping**: If the target bolt has one or more tasks in
the same worker process, tuples will be shuffled to just those in-process
tasks. Otherwise, this acts like a normal shuffle grouping.
**Resources:**
-* [TopologyBuilder](javadocs/backtype/storm/topology/TopologyBuilder.html):
use this class to define topologies
-* [InputDeclarer](javadocs/backtype/storm/topology/InputDeclarer.html): this
object is returned whenever `setBolt` is called on `TopologyBuilder` and is
used for declaring a bolt's input streams and how those streams should be
grouped
+* [TopologyBuilder](javadocs/org/apache/storm/topology/TopologyBuilder.html):
use this class to define topologies
+* [InputDeclarer](javadocs/org/apache/storm/topology/InputDeclarer.html): this
object is returned whenever `setBolt` is called on `TopologyBuilder` and is
used for declaring a bolt's input streams and how those streams should be
grouped
### Reliability
Storm guarantees that every spout tuple will be fully processed by the
topology. It does this by tracking the tree of tuples triggered by every spout
tuple and determining when that tree of tuples has been successfully completed.
Every topology has a "message timeout" associated with it. If Storm fails to
detect that a spout tuple has been completed within that timeout, then it fails
the tuple and replays it later.
-To take advantage of Storm's reliability capabilities, you must tell Storm
when new edges in a tuple tree are being created and tell Storm whenever you've
finished processing an individual tuple. These are done using the
[OutputCollector](javadocs/backtype/storm/task/OutputCollector.html) object
that bolts use to emit tuples. Anchoring is done in the `emit` method, and you
declare that you're finished with a tuple using the `ack` method.
+To take advantage of Storm's reliability capabilities, you must tell Storm
when new edges in a tuple tree are being created and tell Storm whenever you've
finished processing an individual tuple. These are done using the
[OutputCollector](javadocs/org/apache/storm/task/OutputCollector.html) object
that bolts use to emit tuples. Anchoring is done in the `emit` method, and you
declare that you're finished with a tuple using the `ack` method.
This is all explained in much more detail in [Guaranteeing message
processing](Guaranteeing-message-processing.html).
### Tasks
-Each spout or bolt executes as many tasks across the cluster. Each task
corresponds to one thread of execution, and stream groupings define how to send
tuples from one set of tasks to another set of tasks. You set the parallelism
for each spout or bolt in the `setSpout` and `setBolt` methods of
[TopologyBuilder](javadocs/backtype/storm/topology/TopologyBuilder.html).
+Each spout or bolt executes as many tasks across the cluster. Each task
corresponds to one thread of execution, and stream groupings define how to send
tuples from one set of tasks to another set of tasks. You set the parallelism
for each spout or bolt in the `setSpout` and `setBolt` methods of
[TopologyBuilder](javadocs/org/apache/storm/topology/TopologyBuilder.html).
### Workers
@@ -112,4 +112,4 @@ Topologies execute across one or more wo
**Resources:**
-*
[Config.TOPOLOGY_WORKERS](javadocs/backtype/storm/Config.html#TOPOLOGY_WORKERS):
this config sets the number of workers to allocate for executing the topology
+*
[Config.TOPOLOGY_WORKERS](javadocs/org/apache/storm/Config.html#TOPOLOGY_WORKERS):
this config sets the number of workers to allocate for executing the topology
Modified:
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Configuration.md
URL:
http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Configuration.md?rev=1735653&r1=1735652&r2=1735653&view=diff
==============================================================================
---
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Configuration.md
(original)
+++
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Configuration.md
Fri Mar 18 17:56:59 2016
@@ -5,7 +5,7 @@ documentation: true
---
Storm has a variety of configurations for tweaking the behavior of nimbus,
supervisors, and running topologies. Some configurations are system
configurations and cannot be modified on a topology by topology basis, whereas
other configurations can be modified per topology.
-Every configuration has a default value defined in
[defaults.yaml]({{page.git-blob-base}}/conf/defaults.yaml) in the Storm
codebase. You can override these configurations by defining a storm.yaml in the
classpath of Nimbus and the supervisors. Finally, you can define a
topology-specific configuration that you submit along with your topology when
using [StormSubmitter](javadocs/backtype/storm/StormSubmitter.html). However,
the topology-specific configuration can only override configs prefixed with
"TOPOLOGY".
+Every configuration has a default value defined in
[defaults.yaml]({{page.git-blob-base}}/conf/defaults.yaml) in the Storm
codebase. You can override these configurations by defining a storm.yaml in the
classpath of Nimbus and the supervisors. Finally, you can define a
topology-specific configuration that you submit along with your topology when
using [StormSubmitter](javadocs/org/apache/storm/StormSubmitter.html). However,
the topology-specific configuration can only override configs prefixed with
"TOPOLOGY".
Storm 0.7.0 and onwards lets you override configuration on a
per-bolt/per-spout basis. The only configurations that can be overriden this
way are:
@@ -24,7 +24,7 @@ The preference order for configuration v
**Resources:**
-* [Config](javadocs/backtype/storm/Config.html): a listing of all
configurations as well as a helper class for creating topology specific
configurations
+* [Config](javadocs/org/apache/storm/Config.html): a listing of all
configurations as well as a helper class for creating topology specific
configurations
* [defaults.yaml]({{page.git-blob-base}}/conf/defaults.yaml): the default
values for all configurations
* [Setting up a Storm cluster](Setting-up-a-Storm-cluster.html): explains how
to create and configure a Storm cluster
* [Running topologies on a production
cluster](Running-topologies-on-a-production-cluster.html): lists useful
configurations when running topologies on a cluster
Modified:
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Distributed-RPC.md
URL:
http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Distributed-RPC.md?rev=1735653&r1=1735652&r2=1735653&view=diff
==============================================================================
---
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Distributed-RPC.md
(original)
+++
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Distributed-RPC.md
Fri Mar 18 17:56:59 2016
@@ -24,7 +24,7 @@ A client sends the DRPC server the name
### LinearDRPCTopologyBuilder
-Storm comes with a topology builder called
[LinearDRPCTopologyBuilder](javadocs/backtype/storm/drpc/LinearDRPCTopologyBuilder.html)
that automates almost all the steps involved for doing DRPC. These include:
+Storm comes with a topology builder called
[LinearDRPCTopologyBuilder](javadocs/org/apache/storm/drpc/LinearDRPCTopologyBuilder.html)
that automates almost all the steps involved for doing DRPC. These include:
1. Setting up the spout
2. Returning the results to the DRPC server
Modified:
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Guaranteeing-message-processing.md
URL:
http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Guaranteeing-message-processing.md?rev=1735653&r1=1735652&r2=1735653&view=diff
==============================================================================
---
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Guaranteeing-message-processing.md
(original)
+++
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Guaranteeing-message-processing.md
Fri Mar 18 17:56:59 2016
@@ -26,11 +26,11 @@ This topology reads sentences off of a K

-Storm considers a tuple coming off a spout "fully processed" when the tuple
tree has been exhausted and every message in the tree has been processed. A
tuple is considered failed when its tree of messages fails to be fully
processed within a specified timeout. This timeout can be configured on a
topology-specific basis using the
[Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS](javadocs/backtype/storm/Config.html#TOPOLOGY_MESSAGE_TIMEOUT_SECS)
configuration and defaults to 30 seconds.
+Storm considers a tuple coming off a spout "fully processed" when the tuple
tree has been exhausted and every message in the tree has been processed. A
tuple is considered failed when its tree of messages fails to be fully
processed within a specified timeout. This timeout can be configured on a
topology-specific basis using the
[Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS](javadocs/org/apache/storm/Config.html#TOPOLOGY_MESSAGE_TIMEOUT_SECS)
configuration and defaults to 30 seconds.
### What happens if a message is fully processed or fails to be fully
processed?
-To understand this question, let's take a look at the lifecycle of a tuple
coming off of a spout. For reference, here is the interface that spouts
implement (see the [Javadoc](javadocs/backtype/storm/spout/ISpout.html) for
more information):
+To understand this question, let's take a look at the lifecycle of a tuple
coming off of a spout. For reference, here is the interface that spouts
implement (see the [Javadoc](javadocs/org/apache/storm/spout/ISpout.html) for
more information):
```java
public interface ISpout extends Serializable {
@@ -132,11 +132,11 @@ In contrast, bolts that do aggregations
### How do I make my applications work correctly given that tuples can be
replayed?
-As always in software design, the answer is "it depends." If you really want
exactly once semantics use the [Trident](Trident-tutorial.html) API. In some
cases, like with a lot of analytics, dropping data is OK so disabling the fault
tolerance by setting the number of acker bolts to 0
[Config.TOPOLOGY_ACKERS](javadocs/backtype/storm/Config.html#TOPOLOGY_ACKERS).
But in some cases you want to be sure that everything was processed at least
once and nothing was dropped. This is especially useful if all operations are
idenpotent or if deduping can happen aferwards.
+As always in software design, the answer is "it depends." If you really want
exactly once semantics use the [Trident](Trident-tutorial.html) API. In some
cases, like with a lot of analytics, dropping data is OK so disabling the fault
tolerance by setting the number of acker bolts to 0
[Config.TOPOLOGY_ACKERS](javadocs/org/apache/storm/Config.html#TOPOLOGY_ACKERS).
But in some cases you want to be sure that everything was processed at least
once and nothing was dropped. This is especially useful if all operations are
idenpotent or if deduping can happen aferwards.
### How does Storm implement reliability in an efficient way?
-A Storm topology has a set of special "acker" tasks that track the DAG of
tuples for every spout tuple. When an acker sees that a DAG is complete, it
sends a message to the spout task that created the spout tuple to ack the
message. You can set the number of acker tasks for a topology in the topology
configuration using
[Config.TOPOLOGY_ACKERS](javadocs/backtype/storm/Config.html#TOPOLOGY_ACKERS).
Storm defaults TOPOLOGY_ACKERS to one task per worker.
+A Storm topology has a set of special "acker" tasks that track the DAG of
tuples for every spout tuple. When an acker sees that a DAG is complete, it
sends a message to the spout task that created the spout tuple to ack the
message. You can set the number of acker tasks for a topology in the topology
configuration using
[Config.TOPOLOGY_ACKERS](javadocs/org/apache/storm/Config.html#TOPOLOGY_ACKERS).
Storm defaults TOPOLOGY_ACKERS to one task per worker.
The best way to understand Storm's reliability implementation is to look at
the lifecycle of tuples and tuple DAGs. When a tuple is created in a topology,
whether in a spout or a bolt, it is given a random 64 bit id. These ids are
used by ackers to track the tuple DAG for every spout tuple.
Modified: storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Hooks.md
URL:
http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Hooks.md?rev=1735653&r1=1735652&r2=1735653&view=diff
==============================================================================
--- storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Hooks.md
(original)
+++ storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Hooks.md Fri
Mar 18 17:56:59 2016
@@ -3,7 +3,7 @@ title: Hooks
layout: documentation
documentation: true
---
-Storm provides hooks with which you can insert custom code to run on any
number of events within Storm. You create a hook by extending the
[BaseTaskHook](javadocs/backtype/storm/hooks/BaseTaskHook.html) class and
overriding the appropriate method for the event you want to catch. There are
two ways to register your hook:
+Storm provides hooks with which you can insert custom code to run on any
number of events within Storm. You create a hook by extending the
[BaseTaskHook](javadocs/org/apache/storm/hooks/BaseTaskHook.html) class and
overriding the appropriate method for the event you want to catch. There are
two ways to register your hook:
-1. In the open method of your spout or prepare method of your bolt using the
[TopologyContext](javadocs/backtype/storm/task/TopologyContext.html#addTaskHook)
method.
-2. Through the Storm configuration using the
["topology.auto.task.hooks"](javadocs/backtype/storm/Config.html#TOPOLOGY_AUTO_TASK_HOOKS)
config. These hooks are automatically registered in every spout or bolt, and
are useful for doing things like integrating with a custom monitoring system.
+1. In the open method of your spout or prepare method of your bolt using the
[TopologyContext](javadocs/org/apache/storm/task/TopologyContext.html#addTaskHook)
method.
+2. Through the Storm configuration using the
["topology.auto.task.hooks"](javadocs/org/apache/storm/Config.html#TOPOLOGY_AUTO_TASK_HOOKS)
config. These hooks are automatically registered in every spout or bolt, and
are useful for doing things like integrating with a custom monitoring system.
Modified:
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Kestrel-and-Storm.md
URL:
http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Kestrel-and-Storm.md?rev=1735653&r1=1735652&r2=1735653&view=diff
==============================================================================
---
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Kestrel-and-Storm.md
(original)
+++
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Kestrel-and-Storm.md
Fri Mar 18 17:56:59 2016
@@ -12,7 +12,7 @@ This tutorial uses examples from the [st
It assumes you are able to run locally a Kestrel server as described
[here](https://github.com/nathanmarz/storm-kestrel).
## Kestrel Server and Queue
-A single kestrel server has a set of queues. A Kestrel queue is a very simple
message queue that runs on the JVM and uses the memcache protocol (with some
extensions) to talk to clients. For details, look at the implementation of the
[KestrelThriftClient](https://github.com/nathanmarz/storm-kestrel/blob/master/src/jvm/backtype/storm/spout/KestrelThriftClient.java)
class provided in [storm-kestrel](https://github.com/nathanmarz/storm-kestrel)
project.
+A single kestrel server has a set of queues. A Kestrel queue is a very simple
message queue that runs on the JVM and uses the memcache protocol (with some
extensions) to talk to clients. For details, look at the implementation of the
[KestrelThriftClient](https://github.com/nathanmarz/storm-kestrel/blob/master/src/jvm/org/apache/storm/spout/KestrelThriftClient.java)
class provided in [storm-kestrel](https://github.com/nathanmarz/storm-kestrel)
project.
Each queue is strictly ordered following the FIFO (first in, first out)
principle. To keep up with performance items are cached in system memory;
though, only the first 128MB is kept in memory. When stopping the server, the
queue state is stored in a journal file.
@@ -120,9 +120,9 @@ In order to stop it type a closing brack
import java.io.InputStream;
import java.util.Random;
- import backtype.storm.spout.KestrelClient;
- import backtype.storm.spout.KestrelClient.Item;
- import backtype.storm.spout.KestrelClient.ParseError;
+ import org.apache.storm.spout.KestrelClient;
+ import org.apache.storm.spout.KestrelClient.Item;
+ import org.apache.storm.spout.KestrelClient.ParseError;
public class AddSentenceItemsToKestrel {
Modified:
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Lifecycle-of-a-topology.md
URL:
http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Lifecycle-of-a-topology.md?rev=1735653&r1=1735652&r2=1735653&view=diff
==============================================================================
---
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Lifecycle-of-a-topology.md
(original)
+++
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Lifecycle-of-a-topology.md
Fri Mar 18 17:56:59 2016
@@ -9,74 +9,74 @@ This page explains in detail the lifecyc
First a couple of important notes about topologies:
-1. The actual topology that runs is different than the topology the user
specifies. The actual topology has implicit streams and an implicit "acker"
bolt added to manage the acking framework (used to guarantee data processing).
The implicit topology is created via the
[system-topology!](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/common.clj#L188)
function.
+1. The actual topology that runs is different than the topology the user
specifies. The actual topology has implicit streams and an implicit "acker"
bolt added to manage the acking framework (used to guarantee data processing).
The implicit topology is created via the
[system-topology!](https://github.com/apache/storm/blob/0.7.1/src/clj/org/apache/storm/daemon/common.clj#L188)
function.
2. `system-topology!` is used in two places:
- - when Nimbus is creating tasks for the topology
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/nimbus.clj#L316)
- - in the worker so it knows where it needs to route messages to
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/worker.clj#L90)
+ - when Nimbus is creating tasks for the topology
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/org/apache/storm/daemon/nimbus.clj#L316)
+ - in the worker so it knows where it needs to route messages to
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/org/apache/storm/daemon/worker.clj#L90)
## Starting a topology
- "storm jar" command executes your class with the specified arguments. The
only special thing that "storm jar" does is set the "storm.jar" environment
variable for use by `StormSubmitter` later.
[code](https://github.com/apache/storm/blob/0.7.1/bin/storm#L101)
- When your code uses `StormSubmitter.submitTopology`, `StormSubmitter` takes
the following actions:
- - First, `StormSubmitter` uploads the jar if it hasn't been uploaded before.
[code](https://github.com/apache/storm/blob/0.7.1/src/jvm/backtype/storm/StormSubmitter.java#L83)
+ - First, `StormSubmitter` uploads the jar if it hasn't been uploaded before.
[code](https://github.com/apache/storm/blob/0.7.1/src/jvm/org/apache/storm/StormSubmitter.java#L83)
- Jar uploading is done via Nimbus's Thrift interface
[code](https://github.com/apache/storm/blob/0.7.1/src/storm.thrift#L200)
- `beginFileUpload` returns a path in Nimbus's inbox
- 15 kilobytes are uploaded at a time through `uploadChunk`
- `finishFileUpload` is called when it's finished uploading
- - Here is Nimbus's implementation of those Thrift methods:
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/nimbus.clj#L694)
- - Second, `StormSubmitter` calls `submitTopology` on the Nimbus thrift
interface
[code](https://github.com/apache/storm/blob/0.7.1/src/jvm/backtype/storm/StormSubmitter.java#L60)
+ - Here is Nimbus's implementation of those Thrift methods:
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/org/apache/storm/daemon/nimbus.clj#L694)
+ - Second, `StormSubmitter` calls `submitTopology` on the Nimbus thrift
interface
[code](https://github.com/apache/storm/blob/0.7.1/src/jvm/org/apache/storm/StormSubmitter.java#L60)
- The topology config is serialized using JSON (JSON is used so that
writing DSL's in any language is as easy as possible)
- Notice that the Thrift `submitTopology` call takes in the Nimbus inbox
path where the jar was uploaded
-- Nimbus receives the topology submission.
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/nimbus.clj#L639)
-- Nimbus normalizes the topology configuration. The main purpose of
normalization is to ensure that every single task will have the same
serialization registrations, which is critical for getting serialization
working correctly.
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/nimbus.clj#L557)
-- Nimbus sets up the static state for the topology
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/nimbus.clj#L661)
+- Nimbus receives the topology submission.
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/org/apache/storm/daemon/nimbus.clj#L639)
+- Nimbus normalizes the topology configuration. The main purpose of
normalization is to ensure that every single task will have the same
serialization registrations, which is critical for getting serialization
working correctly.
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/org/apache/storm/daemon/nimbus.clj#L557)
+- Nimbus sets up the static state for the topology
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/org/apache/storm/daemon/nimbus.clj#L661)
- Jars and configs are kept on local filesystem because they're too big
for Zookeeper. The jar and configs are copied into the path {nimbus local
dir}/stormdist/{topology id}
- `setup-storm-static` writes task -> component mapping into ZK
- `setup-heartbeats` creates a ZK "directory" in which tasks can heartbeat
-- Nimbus calls `mk-assignment` to assign tasks to machines
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/nimbus.clj#L458)
- - Assignment record definition is here:
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/common.clj#L25)
+- Nimbus calls `mk-assignment` to assign tasks to machines
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/org/apache/storm/daemon/nimbus.clj#L458)
+ - Assignment record definition is here:
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/org/apache/storm/daemon/common.clj#L25)
- Assignment contains:
- `master-code-dir`: used by supervisors to download the correct
jars/configs for the topology from Nimbus
- `task->node+port`: Map from a task id to the worker that task should
be running on. (A worker is identified by a node/port pair)
- `node->host`: A map from node id to hostname. This is used so workers
know which machines to connect to to communicate with other workers. Node ids
are used to identify supervisors so that multiple supervisors can be run on one
machine. One place this is done is with Mesos integration.
- `task->start-time-secs`: Contains a map from task id to the timestamp
at which Nimbus launched that task. This is used by Nimbus when monitoring
topologies, as tasks are given a longer timeout to heartbeat when they're first
launched (the launch timeout is configured by "nimbus.task.launch.secs" config)
-- Once topologies are assigned, they're initially in a deactivated mode.
`start-storm` writes data into Zookeeper so that the cluster knows the topology
is active and can start emitting tuples from spouts.
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/nimbus.clj#L504)
+- Once topologies are assigned, they're initially in a deactivated mode.
`start-storm` writes data into Zookeeper so that the cluster knows the topology
is active and can start emitting tuples from spouts.
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/org/apache/storm/daemon/nimbus.clj#L504)
- TODO cluster state diagram (show all nodes and what's kept everywhere)
- Supervisor runs two functions in the background:
- - `synchronize-supervisor`: This is called whenever assignments in
Zookeeper change and also every 10 seconds.
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/supervisor.clj#L241)
- - Downloads code from Nimbus for topologies assigned to this machine for
which it doesn't have the code yet.
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/supervisor.clj#L258)
- - Writes into local filesystem what this node is supposed to be running.
It writes a map from port -> LocalAssignment. LocalAssignment contains a
topology id as well as the list of task ids for that worker.
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/supervisor.clj#L13)
- - `sync-processes`: Reads from the LFS what `synchronize-supervisor` wrote
and compares that to what's actually running on the machine. It then
starts/stops worker processes as necessary to synchronize.
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/supervisor.clj#L177)
+ - `synchronize-supervisor`: This is called whenever assignments in
Zookeeper change and also every 10 seconds.
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/org/apache/storm/daemon/supervisor.clj#L241)
+ - Downloads code from Nimbus for topologies assigned to this machine for
which it doesn't have the code yet.
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/org/apache/storm/daemon/supervisor.clj#L258)
+ - Writes into local filesystem what this node is supposed to be running.
It writes a map from port -> LocalAssignment. LocalAssignment contains a
topology id as well as the list of task ids for that worker.
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/org/apache/storm/daemon/supervisor.clj#L13)
+ - `sync-processes`: Reads from the LFS what `synchronize-supervisor` wrote
and compares that to what's actually running on the machine. It then
starts/stops worker processes as necessary to synchronize.
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/org/apache/storm/daemon/supervisor.clj#L177)
-- Worker processes start up through the `mk-worker` function
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/worker.clj#L67)
- - Worker connects to other workers and starts a thread to monitor for
changes. So if a worker gets reassigned, the worker will automatically
reconnect to the other worker's new location.
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/worker.clj#L123)
- - Monitors whether a topology is active or not and stores that state in the
`storm-active-atom` variable. This variable is used by tasks to determine
whether or not to call `nextTuple` on the spouts.
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/worker.clj#L155)
- - The worker launches the actual tasks as threads within it
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/worker.clj#L178)
-- Tasks are set up through the `mk-task` function
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/task.clj#L160)
- - Tasks set up routing function which takes in a stream and an output tuple
and returns a list of task ids to send the tuple to
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/task.clj#L207)
(there's also a 3-arity version used for direct streams)
- - Tasks set up the spout-specific or bolt-specific code with
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/task.clj#L241)
+- Worker processes start up through the `mk-worker` function
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/org/apache/storm/daemon/worker.clj#L67)
+ - Worker connects to other workers and starts a thread to monitor for
changes. So if a worker gets reassigned, the worker will automatically
reconnect to the other worker's new location.
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/org/apache/storm/daemon/worker.clj#L123)
+ - Monitors whether a topology is active or not and stores that state in the
`storm-active-atom` variable. This variable is used by tasks to determine
whether or not to call `nextTuple` on the spouts.
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/org/apache/storm/daemon/worker.clj#L155)
+ - The worker launches the actual tasks as threads within it
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/org/apache/storm/daemon/worker.clj#L178)
+- Tasks are set up through the `mk-task` function
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/org/apache/storm/daemon/task.clj#L160)
+ - Tasks set up routing function which takes in a stream and an output tuple
and returns a list of task ids to send the tuple to
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/org/apache/storm/daemon/task.clj#L207)
(there's also a 3-arity version used for direct streams)
+ - Tasks set up the spout-specific or bolt-specific code with
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/org/apache/storm/daemon/task.clj#L241)
## Topology Monitoring
- Nimbus monitors the topology during its lifetime
- - Schedules recurring task on the timer thread to check the topologies
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/nimbus.clj#L623)
- - Nimbus's behavior is represented as a finite state machine
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/nimbus.clj#L98)
- - The "monitor" event is called on a topology every
"nimbus.monitor.freq.secs", which calls `reassign-topology` through
`reassign-transition`
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/nimbus.clj#L497)
+ - Schedules recurring task on the timer thread to check the topologies
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/org/apache/storm/daemon/nimbus.clj#L623)
+ - Nimbus's behavior is represented as a finite state machine
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/org/apache/storm/daemon/nimbus.clj#L98)
+ - The "monitor" event is called on a topology every
"nimbus.monitor.freq.secs", which calls `reassign-topology` through
`reassign-transition`
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/org/apache/storm/daemon/nimbus.clj#L497)
- `reassign-topology` calls `mk-assignments`, the same function used to
assign the topology the first time. `mk-assignments` is also capable of
incrementally updating a topology
- `mk-assignments` checks heartbeats and reassigns workers as necessary
- Any reassignments change the state in ZK, which will trigger
supervisors to synchronize and start/stop workers
## Killing a topology
-- "storm kill" command runs this code which just calls the Nimbus Thrift
interface to kill the topology:
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/command/kill_topology.clj)
-- Nimbus receives the kill command
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/nimbus.clj#L671)
-- Nimbus applies the "kill" transition to the topology
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/nimbus.clj#L676)
-- The kill transition function changes the status of the topology to "killed"
and schedules the "remove" event to run "wait time seconds" in the future.
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/nimbus.clj#L63)
+- "storm kill" command runs this code which just calls the Nimbus Thrift
interface to kill the topology:
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/org/apache/storm/command/kill_topology.clj)
+- Nimbus receives the kill command
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/org/apache/storm/daemon/nimbus.clj#L671)
+- Nimbus applies the "kill" transition to the topology
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/org/apache/storm/daemon/nimbus.clj#L676)
+- The kill transition function changes the status of the topology to "killed"
and schedules the "remove" event to run "wait time seconds" in the future.
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/org/apache/storm/daemon/nimbus.clj#L63)
- The wait time defaults to the topology message timeout but can be
overridden with the -w flag in the "storm kill" command
- This causes the topology to be deactivated for the wait time before its
actually shut down. This gives the topology a chance to finish processing what
it's currently processing before shutting down the workers
- - Changing the status during the kill transition ensures that the kill
protocol is fault-tolerant to Nimbus crashing. On startup, if the status of the
topology is "killed", Nimbus schedules the remove event to run "wait time
seconds" in the future
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/nimbus.clj#L111)
-- Removing a topology cleans out the assignment and static information from ZK
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/nimbus.clj#L116)
-- A separate cleanup thread runs the `do-cleanup` function which will clean up
the heartbeat dir and the jars/configs stored locally.
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/nimbus.clj#L577)
+ - Changing the status during the kill transition ensures that the kill
protocol is fault-tolerant to Nimbus crashing. On startup, if the status of the
topology is "killed", Nimbus schedules the remove event to run "wait time
seconds" in the future
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/org/apache/storm/daemon/nimbus.clj#L111)
+- Removing a topology cleans out the assignment and static information from ZK
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/org/apache/storm/daemon/nimbus.clj#L116)
+- A separate cleanup thread runs the `do-cleanup` function which will clean up
the heartbeat dir and the jars/configs stored locally.
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/org/apache/storm/daemon/nimbus.clj#L577)
Modified:
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Local-mode.md
URL:
http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Local-mode.md?rev=1735653&r1=1735652&r2=1735653&view=diff
==============================================================================
--- storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Local-mode.md
(original)
+++ storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Local-mode.md
Fri Mar 18 17:56:59 2016
@@ -8,12 +8,12 @@ Local mode simulates a Storm cluster in
To create an in-process cluster, simply use the `LocalCluster` class. For
example:
```java
-import backtype.storm.LocalCluster;
+import org.apache.storm.LocalCluster;
LocalCluster cluster = new LocalCluster();
```
-You can then submit topologies using the `submitTopology` method on the
`LocalCluster` object. Just like the corresponding method on
[StormSubmitter](javadocs/backtype/storm/StormSubmitter.html), `submitTopology`
takes a name, a topology configuration, and the topology object. You can then
kill a topology using the `killTopology` method which takes the topology name
as an argument.
+You can then submit topologies using the `submitTopology` method on the
`LocalCluster` object. Just like the corresponding method on
[StormSubmitter](javadocs/org/apache/storm/StormSubmitter.html),
`submitTopology` takes a name, a topology configuration, and the topology
object. You can then kill a topology using the `killTopology` method which
takes the topology name as an argument.
To shutdown a local cluster, simple call:
@@ -23,7 +23,7 @@ cluster.shutdown();
### Common configurations for local mode
-You can see a full list of configurations
[here](javadocs/backtype/storm/Config.html).
+You can see a full list of configurations
[here](javadocs/org/apache/storm/Config.html).
1. **Config.TOPOLOGY_MAX_TASK_PARALLELISM**: This config puts a ceiling on the
number of threads spawned for a single component. Oftentimes production
topologies have a lot of parallelism (hundreds of threads) which places
unreasonable load when trying to test the topology in local mode. This config
lets you easy control that parallelism.
2. **Config.TOPOLOGY_DEBUG**: When this is set to true, Storm will log a
message every time a tuple is emitted from any spout or bolt. This is extremely
useful for debugging.
Modified:
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Message-passing-implementation.md
URL:
http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Message-passing-implementation.md?rev=1735653&r1=1735652&r2=1735653&view=diff
==============================================================================
---
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Message-passing-implementation.md
(original)
+++
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Message-passing-implementation.md
Fri Mar 18 17:56:59 2016
@@ -8,23 +8,23 @@ documentation: true
This page walks through how emitting and transferring tuples works in Storm.
- Worker is responsible for message transfer
- - `refresh-connections` is called every "task.refresh.poll.secs" or
whenever assignment in ZK changes. It manages connections to other workers and
maintains a mapping from task -> worker
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/worker.clj#L123)
- - Provides a "transfer function" that is used by tasks to send tuples to
other tasks. The transfer function takes in a task id and a tuple, and it
serializes the tuple and puts it onto a "transfer queue". There is a single
transfer queue for each worker.
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/worker.clj#L56)
- - The serializer is thread-safe
[code](https://github.com/apache/storm/blob/0.7.1/src/jvm/backtype/storm/serialization/KryoTupleSerializer.java#L26)
- - The worker has a single thread which drains the transfer queue and sends
the messages to other workers
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/worker.clj#L185)
- - Message sending happens through this protocol:
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/messaging/protocol.clj)
- - The implementation for distributed mode uses ZeroMQ
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/messaging/zmq.clj)
- - The implementation for local mode uses in memory Java queues (so that
it's easy to use Storm locally without needing to get ZeroMQ installed)
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/messaging/local.clj)
+ - `refresh-connections` is called every "task.refresh.poll.secs" or
whenever assignment in ZK changes. It manages connections to other workers and
maintains a mapping from task -> worker
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/org/apache/storm/daemon/worker.clj#L123)
+ - Provides a "transfer function" that is used by tasks to send tuples to
other tasks. The transfer function takes in a task id and a tuple, and it
serializes the tuple and puts it onto a "transfer queue". There is a single
transfer queue for each worker.
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/org/apache/storm/daemon/worker.clj#L56)
+ - The serializer is thread-safe
[code](https://github.com/apache/storm/blob/0.7.1/src/jvm/org/apache/storm/serialization/KryoTupleSerializer.java#L26)
+ - The worker has a single thread which drains the transfer queue and sends
the messages to other workers
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/org/apache/storm/daemon/worker.clj#L185)
+ - Message sending happens through this protocol:
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/org/apache/storm/messaging/protocol.clj)
+ - The implementation for distributed mode uses ZeroMQ
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/org/apache/storm/messaging/zmq.clj)
+ - The implementation for local mode uses in memory Java queues (so that
it's easy to use Storm locally without needing to get ZeroMQ installed)
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/org/apache/storm/messaging/local.clj)
- Receiving messages in tasks works differently in local mode and distributed
mode
- - In local mode, the tuple is sent directly to an in-memory queue for the
receiving task
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/messaging/local.clj#L21)
- - In distributed mode, each worker listens on a single TCP port for
incoming messages and then routes those messages in-memory to tasks. The TCP
port is called a "virtual port", because it receives [task id, message] and
then routes it to the actual task.
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/worker.clj#L204)
+ - In local mode, the tuple is sent directly to an in-memory queue for the
receiving task
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/org/apache/storm/messaging/local.clj#L21)
+ - In distributed mode, each worker listens on a single TCP port for
incoming messages and then routes those messages in-memory to tasks. The TCP
port is called a "virtual port", because it receives [task id, message] and
then routes it to the actual task.
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/org/apache/storm/daemon/worker.clj#L204)
- The virtual port implementation is here:
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/zilch/virtual_port.clj)
- - Tasks listen on an in-memory ZeroMQ port for messages from the virtual
port
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/task.clj#L201)
- - Bolts listen here:
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/task.clj#L489)
- - Spouts listen here:
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/task.clj#L382)
+ - Tasks listen on an in-memory ZeroMQ port for messages from the virtual
port
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/org/apache/storm/daemon/task.clj#L201)
+ - Bolts listen here:
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/org/apache/storm/daemon/task.clj#L489)
+ - Spouts listen here:
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/org/apache/storm/daemon/task.clj#L382)
- Tasks are responsible for message routing. A tuple is emitted either to a
direct stream (where the task id is specified) or a regular stream. In direct
streams, the message is only sent if that bolt subscribes to that direct
stream. In regular streams, the stream grouping functions are used to determine
the task ids to send the tuple to.
- - Tasks have a routing map from {stream id} -> {component id} -> {stream
grouping function}
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/task.clj#L198)
- - The "tasks-fn" returns the task ids to send the tuples to for either
regular stream emit or direct stream emit
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/task.clj#L207)
+ - Tasks have a routing map from {stream id} -> {component id} -> {stream
grouping function}
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/org/apache/storm/daemon/task.clj#L198)
+ - The "tasks-fn" returns the task ids to send the tuples to for either
regular stream emit or direct stream emit
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/org/apache/storm/daemon/task.clj#L207)
- After getting the output task ids, bolts and spouts use the transfer-fn
provided by the worker to actually transfer the tuples
- - Bolt transfer code here:
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/task.clj#L429)
- - Spout transfer code here:
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/task.clj#L329)
+ - Bolt transfer code here:
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/org/apache/storm/daemon/task.clj#L429)
+ - Spout transfer code here:
[code](https://github.com/apache/storm/blob/0.7.1/src/clj/org/apache/storm/daemon/task.clj#L329)
Modified: storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Metrics.md
URL:
http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Metrics.md?rev=1735653&r1=1735652&r2=1735653&view=diff
==============================================================================
--- storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Metrics.md
(original)
+++ storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Metrics.md Fri
Mar 18 17:56:59 2016
@@ -12,13 +12,13 @@ Metrics have to implement just one metho
Storm gives you these metric types:
-*
[AssignableMetric]({{page.git-blob-base}}/storm-core/src/jvm/backtype/storm/metric/api/AssignableMetric.java)
-- set the metric to the explicit value you supply. Useful if it's an external
value or in the case that you are already calculating the summary statistic
yourself.
-*
[CombinedMetric]({{page.git-blob-base}}/storm-core/src/jvm/backtype/storm/metric/api/CombinedMetric.java)
-- generic interface for metrics that can be updated associatively.
-*
[CountMetric]({{page.git-blob-base}}/storm-core/src/jvm/backtype/storm/metric/api/CountMetric.java)
-- a running total of the supplied values. Call `incr()` to increment by one,
`incrBy(n)` to add/subtract the given number.
- -
[MultiCountMetric]({{page.git-blob-base}}/storm-core/src/jvm/backtype/storm/metric/api/MultiCountMetric.java)
-- a hashmap of count metrics.
-*
[ReducedMetric]({{page.git-blob-base}}/storm-core/src/jvm/backtype/storm/metric/api/ReducedMetric.java)
- -
[MeanReducer]({{page.git-blob-base}}/storm-core/src/jvm/backtype/storm/metric/api/MeanReducer.java)
-- track a running average of values given to its `reduce()` method. (It
accepts `Double`, `Integer` or `Long` values, and maintains the internal
average as a `Double`.) Despite his reputation, the MeanReducer is actually a
pretty nice guy in person.
- -
[MultiReducedMetric]({{page.git-blob-base}}/storm-core/src/jvm/backtype/storm/metric/api/MultiReducedMetric.java)
-- a hashmap of reduced metrics.
+*
[AssignableMetric]({{page.git-blob-base}}/storm-core/src/jvm/org/apache/storm/metric/api/AssignableMetric.java)
-- set the metric to the explicit value you supply. Useful if it's an external
value or in the case that you are already calculating the summary statistic
yourself.
+*
[CombinedMetric]({{page.git-blob-base}}/storm-core/src/jvm/org/apache/storm/metric/api/CombinedMetric.java)
-- generic interface for metrics that can be updated associatively.
+*
[CountMetric]({{page.git-blob-base}}/storm-core/src/jvm/org/apache/storm/metric/api/CountMetric.java)
-- a running total of the supplied values. Call `incr()` to increment by one,
`incrBy(n)` to add/subtract the given number.
+ -
[MultiCountMetric]({{page.git-blob-base}}/storm-core/src/jvm/org/apache/storm/metric/api/MultiCountMetric.java)
-- a hashmap of count metrics.
+*
[ReducedMetric]({{page.git-blob-base}}/storm-core/src/jvm/org/apache/storm/metric/api/ReducedMetric.java)
+ -
[MeanReducer]({{page.git-blob-base}}/storm-core/src/jvm/org/apache/storm/metric/api/MeanReducer.java)
-- track a running average of values given to its `reduce()` method. (It
accepts `Double`, `Integer` or `Long` values, and maintains the internal
average as a `Double`.) Despite his reputation, the MeanReducer is actually a
pretty nice guy in person.
+ -
[MultiReducedMetric]({{page.git-blob-base}}/storm-core/src/jvm/org/apache/storm/metric/api/MultiReducedMetric.java)
-- a hashmap of reduced metrics.
### Metric Consumer
@@ -30,7 +30,7 @@ Storm gives you these metric types:
### Builtin Metrics
-The [builtin
metrics]({{page.git-blob-base}}/storm-core/src/clj/backtype/storm/daemon/builtin_metrics.clj)
instrument Storm itself.
+The [builtin
metrics]({{page.git-blob-base}}/storm-core/src/clj/org/apache/storm/daemon/builtin_metrics.clj)
instrument Storm itself.
-[builtin_metrics.clj]({{page.git-blob-base}}/storm-core/src/clj/backtype/storm/daemon/builtin_metrics.clj)
sets up data structures for the built-in metrics, and facade methods that the
other framework components can use to update them. The metrics themselves are
calculated in the calling code -- see for example
[`ack-spout-msg`]({{page.git-blob-base}}/storm-core/src/clj/backtype/storm/daemon/executor.clj#358)
in `clj/b/s/daemon/daemon/executor.clj`
+[builtin_metrics.clj]({{page.git-blob-base}}/storm-core/src/clj/org/apache/storm/daemon/builtin_metrics.clj)
sets up data structures for the built-in metrics, and facade methods that the
other framework components can use to update them. The metrics themselves are
calculated in the calling code -- see for example
[`ack-spout-msg`]({{page.git-blob-base}}/storm-core/src/clj/org/apache/storm/daemon/executor.clj#358)
in `clj/b/s/daemon/daemon/executor.clj`
Modified:
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Resource_Aware_Scheduler_overview.md
URL:
http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Resource_Aware_Scheduler_overview.md?rev=1735653&r1=1735652&r2=1735653&view=diff
==============================================================================
---
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Resource_Aware_Scheduler_overview.md
(original)
+++
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Resource_Aware_Scheduler_overview.md
Fri Mar 18 17:56:59 2016
@@ -11,7 +11,7 @@ The purpose of this document is to provi
The user can switch to using the Resource Aware Scheduler by setting the
following in *conf/storm.yaml*
- storm.scheduler:
âbacktype.storm.scheduler.resource.ResourceAwareSchedulerâ
+ storm.scheduler:
âorg.apache.storm.scheduler.resource.ResourceAwareSchedulerâ
## API Overview
@@ -178,7 +178,7 @@ Parameters:
Example Usage:
-
conf.setTopologyStrategy(backtype.storm.scheduler.resource.strategies.scheduling.DefaultResourceAwareStrategy.class);
+
conf.setTopologyStrategy(org.apache.storm.scheduler.resource.strategies.scheduling.DefaultResourceAwareStrategy.class);
A default scheduling is provided. The DefaultResourceAwareStrategy is
implemented based off the scheduling algorithm in the original paper describing
resource aware scheduling in Storm:
@@ -188,7 +188,7 @@ http://web.engr.illinois.edu/~bpeng/file
The order of scheduling is a pluggable interface in which a user could define
a strategy that prioritizes topologies. For a user to define his or her own
prioritization strategy, he or she needs to implement the
ISchedulingPriorityStrategy interface. A user can set the scheduling priority
strategy by setting the *Config.RESOURCE_AWARE_SCHEDULER_PRIORITY_STRATEGY* to
point to the class that implements the strategy. For instance:
- resource.aware.scheduler.priority.strategy:
"backtype.storm.scheduler.resource.strategies.priority.DefaultSchedulingPriorityStrategy"
+ resource.aware.scheduler.priority.strategy:
"org.apache.storm.scheduler.resource.strategies.priority.DefaultSchedulingPriorityStrategy"
A default strategy will be provided. The following explains how the default
scheduling priority strategy works.
@@ -218,7 +218,7 @@ When scheduling, RAS sorts users by the
### Specifying Eviction Strategy
The eviction strategy is used when there are not enough free resources in the
cluster to schedule new topologies. If the cluster is full, we need a mechanism
to evict topologies so that user resource guarantees can be met and additional
resource can be shared fairly among users. The strategy for evicting topologies
is also a pluggable interface in which the user can implement his or her own
topology eviction strategy. For a user to implement his or her own eviction
strategy, he or she needs to implement the IEvictionStrategy Interface and set
*Config.RESOURCE_AWARE_SCHEDULER_EVICTION_STRATEGY* to point to the implemented
strategy class. For instance:
- resource.aware.scheduler.eviction.strategy:
"backtype.storm.scheduler.resource.strategies.eviction.DefaultEvictionStrategy"
+ resource.aware.scheduler.eviction.strategy:
"org.apache.storm.scheduler.resource.strategies.eviction.DefaultEvictionStrategy"
A default eviction strategy is provided. The following explains how the
default topology eviction strategy works
Modified:
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Running-topologies-on-a-production-cluster.md
URL:
http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Running-topologies-on-a-production-cluster.md?rev=1735653&r1=1735652&r2=1735653&view=diff
==============================================================================
---
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Running-topologies-on-a-production-cluster.md
(original)
+++
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/Running-topologies-on-a-production-cluster.md
Fri Mar 18 17:56:59 2016
@@ -5,9 +5,9 @@ documentation: true
---
Running topologies on a production cluster is similar to running in [Local
mode](Local-mode.html). Here are the steps:
-1) Define the topology (Use
[TopologyBuilder](javadocs/backtype/storm/topology/TopologyBuilder.html) if
defining using Java)
+1) Define the topology (Use
[TopologyBuilder](javadocs/org/apache/storm/topology/TopologyBuilder.html) if
defining using Java)
-2) Use [StormSubmitter](javadocs/backtype/storm/StormSubmitter.html) to submit
the topology to the cluster. `StormSubmitter` takes as input the name of the
topology, a configuration for the topology, and the topology itself. For
example:
+2) Use [StormSubmitter](javadocs/org/apache/storm/StormSubmitter.html) to
submit the topology to the cluster. `StormSubmitter` takes as input the name of
the topology, a configuration for the topology, and the topology itself. For
example:
```java
Config conf = new Config();
@@ -47,7 +47,7 @@ You can find out how to configure your `
### Common configurations
-There are a variety of configurations you can set per topology. A list of all
the configurations you can set can be found
[here](javadocs/backtype/storm/Config.html). The ones prefixed with "TOPOLOGY"
can be overridden on a topology-specific basis (the other ones are cluster
configurations and cannot be overridden). Here are some common ones that are
set for a topology:
+There are a variety of configurations you can set per topology. A list of all
the configurations you can set can be found
[here](javadocs/org/apache/storm/Config.html). The ones prefixed with
"TOPOLOGY" can be overridden on a topology-specific basis (the other ones are
cluster configurations and cannot be overridden). Here are some common ones
that are set for a topology:
1. **Config.TOPOLOGY_WORKERS**: This sets the number of worker processes to
use to execute the topology. For example, if you set this to 25, there will be
25 Java processes across the cluster executing all the tasks. If you had a
combined 150 parallelism across all components in the topology, each worker
process will have 6 tasks running within it as threads.
2. **Config.TOPOLOGY_ACKER_EXECUTORS**: This sets the number of executors that
will track tuple trees and detect when a spout tuple has been fully processed.
Ackers are an integral part of Storm's reliability model and you can read more
about them on [Guaranteeing message
processing](Guaranteeing-message-processing.html). By not setting this variable
or setting it as null, Storm will set the number of acker executors to be equal
to the number of workers configured for this topology. If this variable is set
to 0, then Storm will immediately ack tuples as soon as they come off the
spout, effectively disabling reliability.
Modified:
storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/SECURITY.md
URL:
http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/SECURITY.md?rev=1735653&r1=1735652&r2=1735653&view=diff
==============================================================================
--- storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/SECURITY.md
(original)
+++ storm/branches/bobby-versioned-site/releases/1.0.0-SNAPSHOT/SECURITY.md Fri
Mar 18 17:56:59 2016
@@ -172,7 +172,7 @@ Each jaas file may have multiple section
To enable Kerberos authentication in storm you need to set the following
storm.yaml configs
```yaml
-storm.thrift.transport:
"backtype.storm.security.auth.kerberos.KerberosSaslTransportPlugin"
+storm.thrift.transport:
"org.apache.storm.security.auth.kerberos.KerberosSaslTransportPlugin"
java.security.auth.login.config: "/path/to/jaas.conf"
```
@@ -275,7 +275,7 @@ Server {
Nimbus also will translate the principal into a local user name, so that other
services can use this name. To configure this for Kerberos authentication set
```
-storm.principal.tolocal:
"backtype.storm.security.auth.KerberosPrincipalToLocal"
+storm.principal.tolocal:
"org.apache.storm.security.auth.KerberosPrincipalToLocal"
```
This only needs to be done on nimbus, but it will not hurt on any node.
@@ -324,7 +324,7 @@ The end user can override this if they h
The preferred authorization plug-in for nimbus is The *SimpleACLAuthorizer*.
To use the *SimpleACLAuthorizer*, set the following:
```yaml
-nimbus.authorizer:
"backtype.storm.security.auth.authorizer.SimpleACLAuthorizer"
+nimbus.authorizer:
"org.apache.storm.security.auth.authorizer.SimpleACLAuthorizer"
```
DRPC has a separate authorizer configuration for it. Do not use
SimpleACLAuthorizer for DRPC.
@@ -348,7 +348,7 @@ To ensure isolation of users in multi-te
To support multi-tenancy better we have written a new scheduler. To enable
this scheduler set.
```yaml
-storm.scheduler: "backtype.storm.scheduler.multitenant.MultitenantScheduler"
+storm.scheduler: "org.apache.storm.scheduler.multitenant.MultitenantScheduler"
```
Be aware that many of the features of this scheduler rely on storm
authentication. Without them the scheduler will not know what the user is and
will not isolate topologies properly.
@@ -391,11 +391,11 @@ A storm client may submit requests on be
it can do so by leveraging the impersonation feature.In order to submit
topology as some other user , you can use `StormSubmitter.submitTopologyAs`
API. Alternatively you can use `NimbusClient.getConfiguredClientAs`
to get a nimbus client as some other user and perform any nimbus action(i.e.
kill/rebalance/activate/deactivate) using this client.
-To ensure only authorized users can perform impersonation you should start
nimbus with `nimbus.impersonation.authorizer` set to
`backtype.storm.security.auth.authorizer.ImpersonationAuthorizer`.
+To ensure only authorized users can perform impersonation you should start
nimbus with `nimbus.impersonation.authorizer` set to
`org.apache.storm.security.auth.authorizer.ImpersonationAuthorizer`.
The `ImpersonationAuthorizer` uses `nimbus.impersonation.acl` as the acl to
authorize users. Following is a sample nimbus config for supporting
impersonation:
```yaml
-nimbus.impersonation.authorizer:
backtype.storm.security.auth.authorizer.ImpersonationAuthorizer
+nimbus.impersonation.authorizer:
org.apache.storm.security.auth.authorizer.ImpersonationAuthorizer
nimbus.impersonation.acl:
impersonating_user1:
hosts:
@@ -423,7 +423,7 @@ nimbus.impersonation.acl:
Individual topologies have the ability to push credentials (tickets and
tokens) to workers so that they can access secure services. Exposing this to
all of the users can be a pain for them.
To hide this from them in the common case plugins can be used to populate the
credentials, unpack them on the other side into a java Subject, and also allow
Nimbus to renew the credentials if needed.
These are controlled by the following configs. topology.auto-credentials is a
list of java plugins, all of which must implement IAutoCredentials interface,
that populate the credentials on gateway
-and unpack them on the worker side. On a kerberos secure cluster they should
be set by default to point to backtype.storm.security.auth.kerberos.AutoTGT.
+and unpack them on the worker side. On a kerberos secure cluster they should
be set by default to point to org.apache.storm.security.auth.kerberos.AutoTGT.
nimbus.credential.renewers.classes should also be set to this value so that
nimbus can periodically renew the TGT on behalf of the user.
nimbus.credential.renewers.freq.secs controls how often the renewer will poll
to see if anything needs to be renewed, but the default should be fine.