Modified: storm/branches/bobby-versioned-site/releases/0.9.6/Transactional-topologies.md URL: http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.9.6/Transactional-topologies.md?rev=1735492&r1=1735491&r2=1735492&view=diff ============================================================================== --- storm/branches/bobby-versioned-site/releases/0.9.6/Transactional-topologies.md (original) +++ storm/branches/bobby-versioned-site/releases/0.9.6/Transactional-topologies.md Thu Mar 17 20:19:17 2016 @@ -1,5 +1,6 @@ --- layout: documentation +version: v0.9.6 --- **NOTE**: Transactional topologies have been deprecated -- use the [Trident](Trident-tutorial.html) framework instead. @@ -79,7 +80,7 @@ Finally, another thing to note is that t ## The basics through example -You build transactional topologies by using [TransactionalTopologyBuilder](/apidocs/backtype/storm/transactional/TransactionalTopologyBuilder.html). Here's the transactional topology definition for a topology that computes the global count of tuples from the input stream. This code comes from [TransactionalGlobalCount](https://github.com/nathanmarz/storm-starter/blob/master/src/jvm/storm/starter/TransactionalGlobalCount.java) in storm-starter. +You build transactional topologies by using [TransactionalTopologyBuilder](javadocs/backtype/storm/transactional/TransactionalTopologyBuilder.html). Here's the transactional topology definition for a topology that computes the global count of tuples from the input stream. This code comes from [TransactionalGlobalCount](https://github.com/nathanmarz/storm-starter/blob/master/src/jvm/storm/starter/TransactionalGlobalCount.java) in storm-starter. ```java MemoryTransactionalSpout spout = new MemoryTransactionalSpout(DATA, new Fields("word"), PARTITION_TAKE_PER_BATCH); @@ -130,7 +131,7 @@ public static class BatchCount extends B A new instance of this object is created for every batch that's being processed. The actual bolt this runs within is called [BatchBoltExecutor](https://github.com/apache/incubator-storm/blob/0.7.0/src/jvm/backtype/storm/coordination/BatchBoltExecutor.java) and manages the creation and cleanup for these objects. -The `prepare` method parameterizes this batch bolt with the Storm config, the topology context, an output collector, and the id for this batch of tuples. In the case of transactional topologies, the id will be a [TransactionAttempt](/apidocs/backtype/storm/transactional/TransactionAttempt.html) object. The batch bolt abstraction can be used in Distributed RPC as well which uses a different type of id for the batches. `BatchBolt` can actually be parameterized with the type of the id, so if you only intend to use the batch bolt for transactional topologies, you can extend `BaseTransactionalBolt` which has this definition: +The `prepare` method parameterizes this batch bolt with the Storm config, the topology context, an output collector, and the id for this batch of tuples. In the case of transactional topologies, the id will be a [TransactionAttempt](javadocs/backtype/storm/transactional/TransactionAttempt.html) object. The batch bolt abstraction can be used in Distributed RPC as well which uses a different type of id for the batches. `BatchBolt` can actually be parameterized with the type of the id, so if you only intend to use the batch bolt for transactional topologies, you can extend `BaseTransactionalBolt` which has this definition: ```java public abstract class BaseTransactionalBolt extends BaseBatchBolt<TransactionAttempt> { @@ -209,9 +210,9 @@ This section outlines the different piec There are three kinds of bolts possible in a transactional topology: -1. [BasicBolt](/apidocs/backtype/storm/topology/base/BaseBasicBolt.html): This bolt doesn't deal with batches of tuples and just emits tuples based on a single tuple of input. -2. [BatchBolt](/apidocs/backtype/storm/topology/base/BaseBatchBolt.html): This bolt processes batches of tuples. `execute` is called for each tuple, and `finishBatch` is called when the batch is complete. -3. BatchBolt's that are marked as committers: The only difference between this bolt and a regular batch bolt is when `finishBatch` is called. A committer bolt has `finishedBatch` called during the commit phase. The commit phase is guaranteed to occur only after all prior batches have successfully committed, and it will be retried until all bolts in the topology succeed the commit for the batch. There are two ways to make a `BatchBolt` a committer, by having the `BatchBolt` implement the [ICommitter](/apidocs/backtype/storm/transactional/ICommitter.html) marker interface, or by using the `setCommiterBolt` method in `TransactionalTopologyBuilder`. +1. [BasicBolt](javadocs/backtype/storm/topology/base/BaseBasicBolt.html): This bolt doesn't deal with batches of tuples and just emits tuples based on a single tuple of input. +2. [BatchBolt](javadocs/backtype/storm/topology/base/BaseBatchBolt.html): This bolt processes batches of tuples. `execute` is called for each tuple, and `finishBatch` is called when the batch is complete. +3. BatchBolt's that are marked as committers: The only difference between this bolt and a regular batch bolt is when `finishBatch` is called. A committer bolt has `finishedBatch` called during the commit phase. The commit phase is guaranteed to occur only after all prior batches have successfully committed, and it will be retried until all bolts in the topology succeed the commit for the batch. There are two ways to make a `BatchBolt` a committer, by having the `BatchBolt` implement the [ICommitter](javadocs/backtype/storm/transactional/ICommitter.html) marker interface, or by using the `setCommiterBolt` method in `TransactionalTopologyBuilder`. #### Processing phase vs. commit phase in bolts @@ -235,7 +236,7 @@ Notice that you don't have to do any ack #### Failing a transaction -When using regular bolts, you can call the `fail` method on `OutputCollector` to fail the tuple trees of which that tuple is a member. Since transactional topologies hide the acking framework from you, they provide a different mechanism to fail a batch (and cause the batch to be replayed). Just throw a [FailedException](/apidocs/backtype/storm/topology/FailedException.html). Unlike regular exceptions, this will only cause that particular batch to replay and will not crash the process. +When using regular bolts, you can call the `fail` method on `OutputCollector` to fail the tuple trees of which that tuple is a member. Since transactional topologies hide the acking framework from you, they provide a different mechanism to fail a batch (and cause the batch to be replayed). Just throw a [FailedException](javadocs/backtype/storm/topology/FailedException.html). Unlike regular exceptions, this will only cause that particular batch to replay and will not crash the process. ### Transactional spout @@ -249,11 +250,11 @@ The coordinator on the left is a regular The need to be idempotent with respect to the tuples it emits requires a `TransactionalSpout` to store a small amount of state. The state is stored in Zookeeper. -The details of implementing a `TransactionalSpout` are in [the Javadoc](/apidocs/backtype/storm/transactional/ITransactionalSpout.html). +The details of implementing a `TransactionalSpout` are in [the Javadoc](javadocs/backtype/storm/transactional/ITransactionalSpout.html). #### Partitioned Transactional Spout -A common kind of transactional spout is one that reads the batches from a set of partitions across many queue brokers. For example, this is how [TransactionalKafkaSpout](https://github.com/nathanmarz/storm-contrib/blob/master/storm-kafka/src/jvm/storm/kafka/TransactionalKafkaSpout.java) works. An `IPartitionedTransactionalSpout` automates the bookkeeping work of managing the state for each partition to ensure idempotent replayability. See [the Javadoc](/apidocs/backtype/storm/transactional/partitioned/IPartitionedTransactionalSpout.html) for more details. +A common kind of transactional spout is one that reads the batches from a set of partitions across many queue brokers. For example, this is how [TransactionalKafkaSpout](https://github.com/nathanmarz/storm-contrib/blob/master/storm-kafka/src/jvm/storm/kafka/TransactionalKafkaSpout.java) works. An `IPartitionedTransactionalSpout` automates the bookkeeping work of managing the state for each partition to ensure idempotent replayability. See [the Javadoc](javadocs/backtype/storm/transactional/partitioned/IPartitionedTransactionalSpout.html) for more details. ### Configuration @@ -323,7 +324,7 @@ In this scenario, tuples 41-50 are skipp By failing all subsequent transactions on failure, no tuples are skipped. This also shows that a requirement of transactional spouts is that they always emit where the last transaction left off. -A non-idempotent transactional spout is more concisely referred to as an "OpaqueTransactionalSpout" (opaque is the opposite of idempotent). [IOpaquePartitionedTransactionalSpout](/apidocs/backtype/storm/transactional/partitioned/IOpaquePartitionedTransactionalSpout.html) is an interface for implementing opaque partitioned transactional spouts, of which [OpaqueTransactionalKafkaSpout](https://github.com/nathanmarz/storm-contrib/blob/kafka0.7/storm-kafka/src/jvm/storm/kafka/OpaqueTransactionalKafkaSpout.java) is an example. `OpaqueTransactionalKafkaSpout` can withstand losing individual Kafka nodes without sacrificing accuracy as long as you use the update strategy as explained in this section. +A non-idempotent transactional spout is more concisely referred to as an "OpaqueTransactionalSpout" (opaque is the opposite of idempotent). [IOpaquePartitionedTransactionalSpout](javadocs/backtype/storm/transactional/partitioned/IOpaquePartitionedTransactionalSpout.html) is an interface for implementing opaque partitioned transactional spouts, of which [OpaqueTransactionalKafkaSpout](https://github.com/nathanmarz/storm-contrib/blob/kafka0.7/storm-kafka/src/jvm/storm/kafka/OpaqueTransactionalKafkaSpout.java) is an example. `OpaqueTransactionalKafkaSpout` can withstand losing individual Kafka nodes without sacrificing accuracy as long as you use the update strategy as explained in this section. ## Implementation
Modified: storm/branches/bobby-versioned-site/releases/0.9.6/Trident-API-Overview.md URL: http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.9.6/Trident-API-Overview.md?rev=1735492&r1=1735491&r2=1735492&view=diff ============================================================================== --- storm/branches/bobby-versioned-site/releases/0.9.6/Trident-API-Overview.md (original) +++ storm/branches/bobby-versioned-site/releases/0.9.6/Trident-API-Overview.md Thu Mar 17 20:19:17 2016 @@ -1,5 +1,6 @@ --- layout: documentation +version: v0.9.6 --- # Trident API overview @@ -308,4 +309,4 @@ When a join happens between streams orig You might be wondering â how do you do something like a "windowed join", where tuples from one side of the join are joined against the last hour of tuples from the other side of the join. -To do this, you would make use of partitionPersist and stateQuery. The last hour of tuples from one side of the join would be stored and rotated in a source of state, keyed by the join field. Then the stateQuery would do lookups by the join field to perform the "join". \ No newline at end of file +To do this, you would make use of partitionPersist and stateQuery. The last hour of tuples from one side of the join would be stored and rotated in a source of state, keyed by the join field. Then the stateQuery would do lookups by the join field to perform the "join". Modified: storm/branches/bobby-versioned-site/releases/0.9.6/Trident-spouts.md URL: http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.9.6/Trident-spouts.md?rev=1735492&r1=1735491&r2=1735492&view=diff ============================================================================== --- storm/branches/bobby-versioned-site/releases/0.9.6/Trident-spouts.md (original) +++ storm/branches/bobby-versioned-site/releases/0.9.6/Trident-spouts.md Thu Mar 17 20:19:17 2016 @@ -1,5 +1,6 @@ --- layout: documentation +version: v0.9.6 --- # Trident spouts Modified: storm/branches/bobby-versioned-site/releases/0.9.6/Trident-state.md URL: http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.9.6/Trident-state.md?rev=1735492&r1=1735491&r2=1735492&view=diff ============================================================================== --- storm/branches/bobby-versioned-site/releases/0.9.6/Trident-state.md (original) +++ storm/branches/bobby-versioned-site/releases/0.9.6/Trident-state.md Thu Mar 17 20:19:17 2016 @@ -1,5 +1,6 @@ --- layout: documentation +version: v0.9.6 --- # State in Trident @@ -27,7 +28,7 @@ Remember, Trident processes tuples as sm 2. There's no overlap between batches of tuples (tuples are in one batch or another, never multiple). 3. Every tuple is in a batch (no tuples are skipped) -This is a pretty easy type of spout to understand, the stream is divided into fixed batches that never change. storm-contrib has [an implementation of a transactional spout](https://github.com/nathanmarz/storm-contrib/blob/master/storm-kafka/src/jvm/storm/kafka/trident/TransactionalTridentKafkaSpout.java) for Kafka. +This is a pretty easy type of spout to understand, the stream is divided into fixed batches that never change. storm-contrib has [an implementation of a transactional spout](https://github.com/nathanmarz/storm-contrib/blob/{{page.version}}/storm-kafka/src/jvm/storm/kafka/trident/TransactionalTridentKafkaSpout.java) for Kafka. You might be wondering â why wouldn't you just always use a transactional spout? They're simple and easy to understand. One reason you might not use one is because they're not necessarily very fault-tolerant. For example, the way TransactionalTridentKafkaSpout works is the batch for a txid will contain tuples from all the Kafka partitions for a topic. Once a batch has been emitted, any time that batch is re-emitted in the future the exact same set of tuples must be emitted to meet the semantics of transactional spouts. Now suppose a batch is emitted from TransactionalTridentKafkaSpout, the batch fails to process, and at the same time one of the Kafka nodes goes down. You're now incapable of replaying the same batch as you did before (since the node is down and some partitions for the topic are not unavailable), and processing will halt. @@ -71,7 +72,7 @@ As described before, an opaque transacti 1. Every tuple is *successfully* processed in exactly one batch. However, it's possible for a tuple to fail to process in one batch and then succeed to process in a later batch. -[OpaqueTridentKafkaSpout](https://github.com/nathanmarz/storm-contrib/blob/master/storm-kafka/src/jvm/storm/kafka/trident/OpaqueTridentKafkaSpout.java) is a spout that has this property and is fault-tolerant to losing Kafka nodes. Whenever it's time for OpaqueTridentKafkaSpout to emit a batch, it emits tuples starting from where the last batch finished emitting. This ensures that no tuple is ever skipped or successfully processed by multiple batches. +[OpaqueTridentKafkaSpout](https://github.com/nathanmarz/storm-contrib/blob/{{page.version}}/storm-kafka/src/jvm/storm/kafka/trident/OpaqueTridentKafkaSpout.java) is a spout that has this property and is fault-tolerant to losing Kafka nodes. Whenever it's time for OpaqueTridentKafkaSpout to emit a batch, it emits tuples starting from where the last batch finished emitting. This ensures that no tuple is ever skipped or successfully processed by multiple batches. With opaque transactional spouts, it's no longer possible to use the trick of skipping state updates if the transaction id in the database is the same as the transaction id for the current batch. This is because the batch may have changed between state updates. @@ -308,7 +309,7 @@ public interface Snapshottable<T> extend } ``` -[MemoryMapState](https://github.com/apache/incubator-storm/blob/master/storm-core/src/jvm/storm/trident/testing/MemoryMapState.java) and [MemcachedState](https://github.com/nathanmarz/trident-memcached/blob/master/src/jvm/trident/memcached/MemcachedState.java) each implement both of these interfaces. +[MemoryMapState](https://github.com/apache/incubator-storm/blob/{{page.version}}/storm-core/src/jvm/storm/trident/testing/MemoryMapState.java) and [MemcachedState](https://github.com/nathanmarz/trident-memcached/blob/master/src/jvm/trident/memcached/MemcachedState.java) each implement both of these interfaces. ## Implementing Map States @@ -321,10 +322,10 @@ public interface IBackingMap<T> { } ``` -OpaqueMap's will call multiPut with [OpaqueValue](https://github.com/apache/incubator-storm/blob/master/storm-core/src/jvm/storm/trident/state/OpaqueValue.java)'s for the vals, TransactionalMap's will give [TransactionalValue](https://github.com/apache/incubator-storm/blob/master/storm-core/src/jvm/storm/trident/state/TransactionalValue.java)'s for the vals, and NonTransactionalMaps will just pass the objects from the topology through. +OpaqueMap's will call multiPut with [OpaqueValue](https://github.com/apache/incubator-storm/blob/{{page.version}}/storm-core/src/jvm/storm/trident/state/OpaqueValue.java)'s for the vals, TransactionalMap's will give [TransactionalValue](https://github.com/apache/incubator-storm/blob/{{page.version}}/storm-core/src/jvm/storm/trident/state/TransactionalValue.java)'s for the vals, and NonTransactionalMaps will just pass the objects from the topology through. -Trident also provides the [CachedMap](https://github.com/apache/incubator-storm/blob/master/storm-core/src/jvm/storm/trident/state/map/CachedMap.java) class to do automatic LRU caching of map key/vals. +Trident also provides the [CachedMap](https://github.com/apache/incubator-storm/blob/{{page.version}}/storm-core/src/jvm/storm/trident/state/map/CachedMap.java) class to do automatic LRU caching of map key/vals. -Finally, Trident provides the [SnapshottableMap](https://github.com/apache/incubator-storm/blob/master/storm-core/src/jvm/storm/trident/state/map/SnapshottableMap.java) class that turns a MapState into a Snapshottable object, by storing global aggregations into a fixed key. +Finally, Trident provides the [SnapshottableMap](https://github.com/apache/incubator-storm/blob/{{page.version}}/storm-core/src/jvm/storm/trident/state/map/SnapshottableMap.java) class that turns a MapState into a Snapshottable object, by storing global aggregations into a fixed key. Take a look at the implementation of [MemcachedState](https://github.com/nathanmarz/trident-memcached/blob/master/src/jvm/trident/memcached/MemcachedState.java) to see how all these utilities can be put together to make a high performance MapState implementation. MemcachedState allows you to choose between opaque transactional, transactional, and non-transactional semantics. Modified: storm/branches/bobby-versioned-site/releases/0.9.6/Trident-tutorial.md URL: http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.9.6/Trident-tutorial.md?rev=1735492&r1=1735491&r2=1735492&view=diff ============================================================================== --- storm/branches/bobby-versioned-site/releases/0.9.6/Trident-tutorial.md (original) +++ storm/branches/bobby-versioned-site/releases/0.9.6/Trident-tutorial.md Thu Mar 17 20:19:17 2016 @@ -1,5 +1,6 @@ --- layout: documentation +version: v0.9.6 --- # Trident tutorial @@ -234,7 +235,7 @@ Trident solves this problem by doing two With these two primitives, you can achieve exactly-once semantics with your state updates. Rather than store just the count in the database, what you can do instead is store the transaction id with the count in the database as an atomic value. Then, when updating the count, you can just compare the transaction id in the database with the transaction id for the current batch. If they're the same, you skip the update â because of the strong ordering, you know for sure that the value in the database incorporates the current batch. If they're different, you increment the count. -Of course, you don't have to do this logic manually in your topologies. This logic is wrapped by the State abstraction and done automatically. Nor is your State object required to implement the transaction id trick: if you don't want to pay the cost of storing the transaction id in the database, you don't have to. In that case the State will have at-least-once-processing semantics in the case of failures (which may be fine for your application). You can read more about how to implement a State and the various fault-tolerance tradeoffs possible [in this doc](/documentation/Trident-state). +Of course, you don't have to do this logic manually in your topologies. This logic is wrapped by the State abstraction and done automatically. Nor is your State object required to implement the transaction id trick: if you don't want to pay the cost of storing the transaction id in the database, you don't have to. In that case the State will have at-least-once-processing semantics in the case of failures (which may be fine for your application). You can read more about how to implement a State and the various fault-tolerance tradeoffs possible [in this doc](Trident-state.html). A State is allowed to use whatever strategy it wants to store state. So it could store state in an external database or it could keep the state in-memory but backed by HDFS (like how HBase works). State's are not required to hold onto state forever. For example, you could have an in-memory State implementation that only keeps the last X hours of data available and drops anything older. Take a look at the implementation of the [Memcached integration](https://github.com/nathanmarz/trident-memcached/blob/master/src/jvm/trident/memcached/MemcachedState.java) for an example State implementation. @@ -250,4 +251,4 @@ It would compile into Storm spouts/bolts ## Conclusion -Trident makes realtime computation elegant. You've seen how high throughput stream processing, state manipulation, and low-latency querying can be seamlessly intermixed via Trident's API. Trident lets you express your realtime computations in a natural way while still getting maximal performance. \ No newline at end of file +Trident makes realtime computation elegant. You've seen how high throughput stream processing, state manipulation, and low-latency querying can be seamlessly intermixed via Trident's API. Trident lets you express your realtime computations in a natural way while still getting maximal performance. Modified: storm/branches/bobby-versioned-site/releases/0.9.6/Troubleshooting.md URL: http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.9.6/Troubleshooting.md?rev=1735492&r1=1735491&r2=1735492&view=diff ============================================================================== --- storm/branches/bobby-versioned-site/releases/0.9.6/Troubleshooting.md (original) +++ storm/branches/bobby-versioned-site/releases/0.9.6/Troubleshooting.md Thu Mar 17 20:19:17 2016 @@ -1,5 +1,6 @@ --- layout: documentation +version: v0.9.6 --- ## Troubleshooting @@ -141,4 +142,4 @@ Caused by: java.lang.NullPointerExceptio Solution: - * This is caused by having multiple threads issue methods on the `OutputCollector`. All emits, acks, and fails must happen on the same thread. One subtle way this can happen is if you make a `IBasicBolt` that emits on a separate thread. `IBasicBolt`'s automatically ack after execute is called, so this would cause multiple threads to use the `OutputCollector` leading to this exception. When using a basic bolt, all emits must happen in the same thread that runs `execute`. \ No newline at end of file + * This is caused by having multiple threads issue methods on the `OutputCollector`. All emits, acks, and fails must happen on the same thread. One subtle way this can happen is if you make a `IBasicBolt` that emits on a separate thread. `IBasicBolt`'s automatically ack after execute is called, so this would cause multiple threads to use the `OutputCollector` leading to this exception. When using a basic bolt, all emits must happen in the same thread that runs `execute`. Modified: storm/branches/bobby-versioned-site/releases/0.9.6/Tutorial.md URL: http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.9.6/Tutorial.md?rev=1735492&r1=1735491&r2=1735492&view=diff ============================================================================== --- storm/branches/bobby-versioned-site/releases/0.9.6/Tutorial.md (original) +++ storm/branches/bobby-versioned-site/releases/0.9.6/Tutorial.md Thu Mar 17 20:19:17 2016 @@ -1,5 +1,6 @@ --- layout: documentation +version: v0.9.6 --- In this tutorial, you'll learn how to create Storm topologies and deploy them to a Storm cluster. Java will be the main language used, but a few examples will use Python to illustrate Storm's multi-language capabilities. @@ -101,11 +102,11 @@ This topology contains a spout and two b This code defines the nodes using the `setSpout` and `setBolt` methods. These methods take as input a user-specified id, an object containing the processing logic, and the amount of parallelism you want for the node. In this example, the spout is given id "words" and the bolts are given ids "exclaim1" and "exclaim2". -The object containing the processing logic implements the [IRichSpout](/apidocs/backtype/storm/topology/IRichSpout.html) interface for spouts and the [IRichBolt](/apidocs/backtype/storm/topology/IRichBolt.html) interface for bolts. +The object containing the processing logic implements the [IRichSpout](javadocs/backtype/storm/topology/IRichSpout.html) interface for spouts and the [IRichBolt](javadocs/backtype/storm/topology/IRichBolt.html) interface for bolts. The last parameter, how much parallelism you want for the node, is optional. It indicates how many threads should execute that component across the cluster. If you omit it, Storm will only allocate one thread for that node. -`setBolt` returns an [InputDeclarer](/apidocs/backtype/storm/topology/InputDeclarer.html) object that is used to define the inputs to the Bolt. Here, component "exclaim1" declares that it wants to read all the tuples emitted by component "words" using a shuffle grouping, and component "exclaim2" declares that it wants to read all the tuples emitted by component "exclaim1" using a shuffle grouping. "shuffle grouping" means that tuples should be randomly distributed from the input tasks to the bolt's tasks. There are many ways to group data between components. These will be explained in a few sections. +`setBolt` returns an [InputDeclarer](javadocs/backtype/storm/topology/InputDeclarer.html) object that is used to define the inputs to the Bolt. Here, component "exclaim1" declares that it wants to read all the tuples emitted by component "words" using a shuffle grouping, and component "exclaim2" declares that it wants to read all the tuples emitted by component "exclaim1" using a shuffle grouping. "shuffle grouping" means that tuples should be randomly distributed from the input tasks to the bolt's tasks. There are many ways to group data between components. These will be explained in a few sections. If you wanted component "exclaim2" to read all the tuples emitted by both component "words" and component "exclaim1", you would write component "exclaim2"'s definition like this: @@ -161,7 +162,7 @@ public static class ExclamationBolt impl The `prepare` method provides the bolt with an `OutputCollector` that is used for emitting tuples from this bolt. Tuples can be emitted at anytime from the bolt -- in the `prepare`, `execute`, or `cleanup` methods, or even asynchronously in another thread. This `prepare` implementation simply saves the `OutputCollector` as an instance variable to be used later on in the `execute` method. -The `execute` method receives a tuple from one of the bolt's inputs. The `ExclamationBolt` grabs the first field from the tuple and emits a new tuple with the string "!!!" appended to it. If you implement a bolt that subscribes to multiple input sources, you can find out which component the [Tuple](/apidocs/backtype/storm/tuple/Tuple.html) came from by using the `Tuple#getSourceComponent` method. +The `execute` method receives a tuple from one of the bolt's inputs. The `ExclamationBolt` grabs the first field from the tuple and emits a new tuple with the string "!!!" appended to it. If you implement a bolt that subscribes to multiple input sources, you can find out which component the [Tuple](javadocs/backtype/storm/tuple/Tuple.html) came from by using the `Tuple#getSourceComponent` method. There's a few other things going in in the `execute` method, namely that the input tuple is passed as the first argument to `emit` and the input tuple is acked on the final line. These are part of Storm's reliability API for guaranteeing no data loss and will be explained later in this tutorial. @@ -223,7 +224,7 @@ The configuration is used to tune variou 1. **TOPOLOGY_WORKERS** (set with `setNumWorkers`) specifies how many _processes_ you want allocated around the cluster to execute the topology. Each component in the topology will execute as many _threads_. The number of threads allocated to a given component is configured through the `setBolt` and `setSpout` methods. Those _threads_ exist within worker _processes_. Each worker _process_ contains within it some number of _threads_ for some number of components. For instance, you may have 300 threads specified across all your components and 50 worker processes specified in your config. Each worker process will execute 6 threads, each of which of could belong to a different component. You tune the performance of Storm topologies by tweaking the parallelism for each component and the number of worker processes those threads should run within. 2. **TOPOLOGY_DEBUG** (set with `setDebug`), when set to true, tells Storm to log every message every emitted by a component. This is useful in local mode when testing topologies, but you probably want to keep this turned off when running topologies on the cluster. -There's many other configurations you can set for the topology. The various configurations are detailed on [the Javadoc for Config](/apidocs/backtype/storm/Config.html). +There's many other configurations you can set for the topology. The various configurations are detailed on [the Javadoc for Config](javadocs/backtype/storm/Config.html). To learn about how to set up your development environment so that you can run topologies in local mode (such as in Eclipse), see [Creating a new Storm project](Creating-a-new-Storm-project.html). @@ -307,4 +308,4 @@ This tutorial showed how to do basic str ## Conclusion -This tutorial gave a broad overview of developing, testing, and deploying Storm topologies. The rest of the documentation dives deeper into all the aspects of using Storm. \ No newline at end of file +This tutorial gave a broad overview of developing, testing, and deploying Storm topologies. The rest of the documentation dives deeper into all the aspects of using Storm. Modified: storm/branches/bobby-versioned-site/releases/0.9.6/Understanding-the-parallelism-of-a-Storm-topology.md URL: http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.9.6/Understanding-the-parallelism-of-a-Storm-topology.md?rev=1735492&r1=1735491&r2=1735492&view=diff ============================================================================== --- storm/branches/bobby-versioned-site/releases/0.9.6/Understanding-the-parallelism-of-a-Storm-topology.md (original) +++ storm/branches/bobby-versioned-site/releases/0.9.6/Understanding-the-parallelism-of-a-Storm-topology.md Thu Mar 17 20:19:17 2016 @@ -1,5 +1,6 @@ --- layout: documentation +version: v0.9.6 --- # What makes a running topology: worker processes, executors and tasks @@ -28,25 +29,25 @@ The following sections give an overview ## Number of worker processes * Description: How many worker processes to create _for the topology_ across machines in the cluster. -* Configuration option: [TOPOLOGY_WORKERS](/apidocs/backtype/storm/Config.html#TOPOLOGY_WORKERS) +* Configuration option: [TOPOLOGY_WORKERS](javadocs/backtype/storm/Config.html#TOPOLOGY_WORKERS) * How to set in your code (examples): - * [Config#setNumWorkers](/apidocs/backtype/storm/Config.html) + * [Config#setNumWorkers](javadocs/backtype/storm/Config.html) ## Number of executors (threads) * Description: How many executors to spawn _per component_. * Configuration option: ? * How to set in your code (examples): - * [TopologyBuilder#setSpout()](/apidocs/backtype/storm/topology/TopologyBuilder.html) - * [TopologyBuilder#setBolt()](/apidocs/backtype/storm/topology/TopologyBuilder.html) + * [TopologyBuilder#setSpout()](javadocs/backtype/storm/topology/TopologyBuilder.html) + * [TopologyBuilder#setBolt()](javadocs/backtype/storm/topology/TopologyBuilder.html) * Note that as of Storm 0.8 the ``parallelism_hint`` parameter now specifies the initial number of executors (not tasks!) for that bolt. ## Number of tasks * Description: How many tasks to create _per component_. -* Configuration option: [TOPOLOGY_TASKS](/apidocs/backtype/storm/Config.html#TOPOLOGY_TASKS) +* Configuration option: [TOPOLOGY_TASKS](javadocs/backtype/storm/Config.html#TOPOLOGY_TASKS) * How to set in your code (examples): - * [ComponentConfigurationDeclarer#setNumTasks()](/apidocs/backtype/storm/topology/ComponentConfigurationDeclarer.html) + * [ComponentConfigurationDeclarer#setNumTasks()](javadocs/backtype/storm/topology/ComponentConfigurationDeclarer.html) Here is an example code snippet to show these settings in practice: @@ -89,7 +90,7 @@ StormSubmitter.submitTopology( And of course Storm comes with additional configuration settings to control the parallelism of a topology, including: -* [TOPOLOGY_MAX_TASK_PARALLELISM](/apidocs/backtype/storm/Config.html#TOPOLOGY_MAX_TASK_PARALLELISM): This setting puts a ceiling on the number of executors that can be spawned for a single component. It is typically used during testing to limit the number of threads spawned when running a topology in local mode. You can set this option via e.g. [Config#setMaxTaskParallelism()](/apidocs/backtype/storm/Config.html). +* [TOPOLOGY_MAX_TASK_PARALLELISM](javadocs/backtype/storm/Config.html#TOPOLOGY_MAX_TASK_PARALLELISM): This setting puts a ceiling on the number of executors that can be spawned for a single component. It is typically used during testing to limit the number of threads spawned when running a topology in local mode. You can set this option via e.g. [Config#setMaxTaskParallelism()](javadocs/backtype/storm/Config.html). # How to change the parallelism of a running topology @@ -117,5 +118,5 @@ $ storm rebalance mytopology -n 5 -e blu * [Running topologies on a production cluster](Running-topologies-on-a-production-cluster.html)] * [Local mode](Local-mode.html) * [Tutorial](Tutorial.html) -* [Storm API documentation](/apidocs/), most notably the class ``Config`` +* [Storm API documentation](javadocs/), most notably the class ``Config`` Modified: storm/branches/bobby-versioned-site/releases/0.9.6/Using-non-JVM-languages-with-Storm.md URL: http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.9.6/Using-non-JVM-languages-with-Storm.md?rev=1735492&r1=1735491&r2=1735492&view=diff ============================================================================== --- storm/branches/bobby-versioned-site/releases/0.9.6/Using-non-JVM-languages-with-Storm.md (original) +++ storm/branches/bobby-versioned-site/releases/0.9.6/Using-non-JVM-languages-with-Storm.md Thu Mar 17 20:19:17 2016 @@ -1,5 +1,6 @@ --- layout: documentation +version: v0.9.6 --- - two pieces: creating topologies and implementing spouts and bolts in other languages - creating topologies in another language is easy since topologies are just thrift structures (link to storm.thrift) @@ -49,4 +50,4 @@ Then you can connect to Nimbus using the ``` void submitTopology(1: string name, 2: string uploadedJarLocation, 3: string jsonConf, 4: StormTopology topology) throws (1: AlreadyAliveException e, 2: InvalidTopologyException ite); -``` \ No newline at end of file +``` Modified: storm/branches/bobby-versioned-site/releases/0.9.6/index.md URL: http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.9.6/index.md?rev=1735492&r1=1735491&r2=1735492&view=diff ============================================================================== --- storm/branches/bobby-versioned-site/releases/0.9.6/index.md (original) +++ storm/branches/bobby-versioned-site/releases/0.9.6/index.md Thu Mar 17 20:19:17 2016 @@ -1,5 +1,6 @@ --- layout: documentation +version: v0.9.6 --- Storm is a distributed realtime computation system. Similar to how Hadoop provides a set of general primitives for doing batch processing, Storm provides a set of general primitives for doing realtime computation. Storm is simple, can be used with any programming language, [is used by many companies](/documentation/Powered-By.html), and is a lot of fun to use!
