Author: bobby Date: Thu Mar 17 22:05:29 2016 New Revision: 1735510 URL: http://svn.apache.org/viewvc?rev=1735510&view=rev Log: Updated some of 0.10.0 to match what is in asf_site in git
Modified: storm/branches/bobby-versioned-site/releases/0.10.0/Command-line-client.md storm/branches/bobby-versioned-site/releases/0.10.0/Concepts.md storm/branches/bobby-versioned-site/releases/0.10.0/Configuration.md storm/branches/bobby-versioned-site/releases/0.10.0/FAQ.md storm/branches/bobby-versioned-site/releases/0.10.0/Hooks.md storm/branches/bobby-versioned-site/releases/0.10.0/Kestrel-and-Storm.md storm/branches/bobby-versioned-site/releases/0.10.0/Maven.md storm/branches/bobby-versioned-site/releases/0.10.0/Message-passing-implementation.md storm/branches/bobby-versioned-site/releases/0.10.0/Running-topologies-on-a-production-cluster.md storm/branches/bobby-versioned-site/releases/0.10.0/Serialization.md storm/branches/bobby-versioned-site/releases/0.10.0/Setting-up-a-Storm-cluster.md storm/branches/bobby-versioned-site/releases/0.10.0/Setting-up-development-environment.md storm/branches/bobby-versioned-site/releases/0.10.0/Trident-API-Overview.md storm/branches/bobby-versioned-site/releases/0.10.0/Trident-state.md storm/branches/bobby-versioned-site/releases/0.10.0/Trident-tutorial.md storm/branches/bobby-versioned-site/releases/0.10.0/Troubleshooting.md storm/branches/bobby-versioned-site/releases/0.10.0/Tutorial.md storm/branches/bobby-versioned-site/releases/0.10.0/Understanding-the-parallelism-of-a-Storm-topology.md storm/branches/bobby-versioned-site/releases/0.10.0/Using-non-JVM-languages-with-Storm.md storm/branches/bobby-versioned-site/releases/0.10.0/images/topology.png Modified: storm/branches/bobby-versioned-site/releases/0.10.0/Command-line-client.md URL: http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.10.0/Command-line-client.md?rev=1735510&r1=1735509&r2=1735510&view=diff ============================================================================== --- storm/branches/bobby-versioned-site/releases/0.10.0/Command-line-client.md (original) +++ storm/branches/bobby-versioned-site/releases/0.10.0/Command-line-client.md Thu Mar 17 22:05:29 2016 @@ -4,7 +4,7 @@ layout: documentation documentation: true version: v0.10.0 --- -This page describes all the commands that are possible with the "storm" command line client. To learn how to set up your "storm" client to talk to a remote cluster, follow the instructions in [Setting up development environment](Setting-up-a-development-environment.html). +This page describes all the commands that are possible with the "storm" command line client. To learn how to set up your "storm" client to talk to a remote cluster, follow the instructions in [Setting up development environment](Setting-up-development-environment.html). These commands are: @@ -48,12 +48,14 @@ Deactivates the specified topology's spo ### rebalance -Syntax: `storm rebalance topology-name [-w wait-time-secs]` +Syntax: `storm rebalance topology-name [-w wait-time-secs] [-n new-num-workers] [-e component=parallelism]*` Sometimes you may wish to spread out where the workers for a topology are running. For example, let's say you have a 10 node cluster running 4 workers per node, and then let's say you add another 10 nodes to the cluster. You may wish to have Storm spread out the workers for the running topology so that each node runs 2 workers. One way to do this is to kill the topology and resubmit it, but Storm provides a "rebalance" command that provides an easier way to do this. Rebalance will first deactivate the topology for the duration of the message timeout (overridable with the -w flag) and then redistribute the workers evenly around the cluster. The topology will then return to its previous state of activation (so a deactivated topology will still be deactivated and an activated topology will go back to being activated). +The rebalance command can also be used to change the parallelism of a running topology. Use the -n and -e switches to change the number of workers or number of executors of a component respectively. + ### repl Syntax: `storm repl` Modified: storm/branches/bobby-versioned-site/releases/0.10.0/Concepts.md URL: http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.10.0/Concepts.md?rev=1735510&r1=1735509&r2=1735510&view=diff ============================================================================== --- storm/branches/bobby-versioned-site/releases/0.10.0/Concepts.md (original) +++ storm/branches/bobby-versioned-site/releases/0.10.0/Concepts.md Thu Mar 17 22:05:29 2016 @@ -79,7 +79,7 @@ Its perfectly fine to launch new threads Part of defining a topology is specifying for each bolt which streams it should receive as input. A stream grouping defines how that stream should be partitioned among the bolt's tasks. -There are seven built-in stream groupings in Storm, and you can implement a custom stream grouping by implementing the [CustomStreamGrouping](javadocs/backtype/storm/grouping/CustomStreamGrouping.html) interface: +There are eight built-in stream groupings in Storm, and you can implement a custom stream grouping by implementing the [CustomStreamGrouping](javadocs/backtype/storm/grouping/CustomStreamGrouping.html) interface: 1. **Shuffle grouping**: Tuples are randomly distributed across the bolt's tasks in a way such that each bolt is guaranteed to get an equal number of tuples. 2. **Fields grouping**: The stream is partitioned by the fields specified in the grouping. For example, if the stream is grouped by the "user-id" field, tuples with the same "user-id" will always go to the same task, but tuples with different "user-id"'s may go to different tasks. Modified: storm/branches/bobby-versioned-site/releases/0.10.0/Configuration.md URL: http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.10.0/Configuration.md?rev=1735510&r1=1735509&r2=1735510&view=diff ============================================================================== --- storm/branches/bobby-versioned-site/releases/0.10.0/Configuration.md (original) +++ storm/branches/bobby-versioned-site/releases/0.10.0/Configuration.md Thu Mar 17 22:05:29 2016 @@ -6,7 +6,7 @@ version: v0.10.0 --- Storm has a variety of configurations for tweaking the behavior of nimbus, supervisors, and running topologies. Some configurations are system configurations and cannot be modified on a topology by topology basis, whereas other configurations can be modified per topology. -Every configuration has a default value defined in [defaults.yaml](https://github.com/apache/storm/{{page.version}}/master/conf/defaults.yaml) in the Storm codebase. You can override these configurations by defining a storm.yaml in the classpath of Nimbus and the supervisors. Finally, you can define a topology-specific configuration that you submit along with your topology when using [StormSubmitter](javadocs/backtype/storm/StormSubmitter.html). However, the topology-specific configuration can only override configs prefixed with "TOPOLOGY". +Every configuration has a default value defined in [defaults.yaml](https://github.com/apache/storm/blob/{{page.version}}/conf/defaults.yaml) in the Storm codebase. You can override these configurations by defining a storm.yaml in the classpath of Nimbus and the supervisors. Finally, you can define a topology-specific configuration that you submit along with your topology when using [StormSubmitter](javadocs/backtype/storm/StormSubmitter.html). However, the topology-specific configuration can only override configs prefixed with "TOPOLOGY". Storm 0.7.0 and onwards lets you override configuration on a per-bolt/per-spout basis. The only configurations that can be overriden this way are: Modified: storm/branches/bobby-versioned-site/releases/0.10.0/FAQ.md URL: http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.10.0/FAQ.md?rev=1735510&r1=1735509&r2=1735510&view=diff ============================================================================== --- storm/branches/bobby-versioned-site/releases/0.10.0/FAQ.md (original) +++ storm/branches/bobby-versioned-site/releases/0.10.0/FAQ.md Thu Mar 17 22:05:29 2016 @@ -29,7 +29,7 @@ version: v0.10.0 ### Halp! I cannot see: -* **my logs** Logs by default go to $STORM_HOME/logs. Check that you have write permissions to that directory. They are configured in the logback/cluster.xml (0.9) and log4j/*.properties in earlier versions. +* **my logs** Logs by default go to $STORM_HOME/logs. Check that you have write permissions to that directory. They are configured in log4j2/{cluster, worker}.xml. * **final JVM settings** Add the `-XX+PrintFlagsFinal` commandline option in the childopts (see the conf file) * **final Java system properties** Add `Properties props = System.getProperties(); props.list(System.out);` near where you build your topology. @@ -63,6 +63,10 @@ You can join streams with join, merge or At time of writing, you can't emit to multiple output streams from Trident -- see [STORM-68](https://issues.apache.org/jira/browse/STORM-68) +### Why am I getting a NotSerializableException/IllegalStateException when my topology is being started up? + +Within the Storm lifecycle, the topology is instantiated and then serialized to byte format to be stored in ZooKeeper, prior to the topology being executed. Within this step, if a spout or bolt within the topology has an initialized unserializable property, serialization will fail. If there is a need for a field that is unserializable, initialize it within the bolt or spout's prepare method, which is run after the topology is delivered to the worker. + ## Spouts ### What is a coordinator, and why are there several? @@ -110,7 +114,7 @@ You can't change the overall batch size ### How do I aggregate events by time? -If have records with an immutable timestamp, and you would like to count, average or otherwise aggregate them into discrete time buckets, Trident is an excellent and scalable solution. +If you have records with an immutable timestamp, and you would like to count, average or otherwise aggregate them into discrete time buckets, Trident is an excellent and scalable solution. Write an `Each` function that turns the timestamp into a time bucket: if the bucket size was "by hour", then the timestamp `2013-08-08 12:34:56` would be mapped to the `2013-08-08 12:00:00` time bucket, and so would everything else in the twelve o'clock hour. Then group on that timebucket and use a grouped persistentAggregate. The persistentAggregate uses a local cacheMap backed by a data store. Groups with many records require very few reads from the data store, and use efficient bulk reads and writes; as long as your data feed is relatively prompt Trident will make very efficient use of memory and network. Even if a server drops off line for a day, then delivers that full day's worth of data in a rush, the old results will be calmly retrieved and updated -- and without interfering with calculating the current results. Modified: storm/branches/bobby-versioned-site/releases/0.10.0/Hooks.md URL: http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.10.0/Hooks.md?rev=1735510&r1=1735509&r2=1735510&view=diff ============================================================================== --- storm/branches/bobby-versioned-site/releases/0.10.0/Hooks.md (original) +++ storm/branches/bobby-versioned-site/releases/0.10.0/Hooks.md Thu Mar 17 22:05:29 2016 @@ -6,5 +6,5 @@ version: v0.10.0 --- Storm provides hooks with which you can insert custom code to run on any number of events within Storm. You create a hook by extending the [BaseTaskHook](javadocs/backtype/storm/hooks/BaseTaskHook.html) class and overriding the appropriate method for the event you want to catch. There are two ways to register your hook: -1. In the open method of your spout or prepare method of your bolt using the [TopologyContext#addTaskHook](javadocs/backtype/storm/task/TopologyContext.html) method. +1. In the open method of your spout or prepare method of your bolt using the [TopologyContext](javadocs/backtype/storm/task/TopologyContext.html#addTaskHook) method. 2. Through the Storm configuration using the ["topology.auto.task.hooks"](javadocs/backtype/storm/Config.html#TOPOLOGY_AUTO_TASK_HOOKS) config. These hooks are automatically registered in every spout or bolt, and are useful for doing things like integrating with a custom monitoring system. Modified: storm/branches/bobby-versioned-site/releases/0.10.0/Kestrel-and-Storm.md URL: http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.10.0/Kestrel-and-Storm.md?rev=1735510&r1=1735509&r2=1735510&view=diff ============================================================================== --- storm/branches/bobby-versioned-site/releases/0.10.0/Kestrel-and-Storm.md (original) +++ storm/branches/bobby-versioned-site/releases/0.10.0/Kestrel-and-Storm.md Thu Mar 17 22:05:29 2016 @@ -8,7 +8,7 @@ This page explains how to use to Storm t ## Preliminaries ### Storm -This tutorial uses examples from the [storm-kestrel](https://github.com/nathanmarz/storm-kestrel) project and the [storm-starter](http://github.com/apache/storm/blob/{{page.version}}/examples/storm-starter) project. It's recommended that you clone those projects and follow along with the examples. Read [Setting up development environment](https://github.com/apache/storm/wiki/Setting-up-development-environment) and [Creating a new Storm project](https://github.com/apache/storm/wiki/Creating-a-new-Storm-project) to get your machine set up. +This tutorial uses examples from the [storm-kestrel](https://github.com/nathanmarz/storm-kestrel) project and the [storm-starter](http://github.com/apache/storm/blob/{{page.version}}/examples/storm-starter) project. It's recommended that you clone those projects and follow along with the examples. Read [Setting up development environment](Setting-up-development-environment.html) and [Creating a new Storm project](Creating-a-new-Storm-project.html) to get your machine set up. ### Kestrel It assumes you are able to run locally a Kestrel server as described [here](https://github.com/nathanmarz/storm-kestrel). Modified: storm/branches/bobby-versioned-site/releases/0.10.0/Maven.md URL: http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.10.0/Maven.md?rev=1735510&r1=1735509&r2=1735510&view=diff ============================================================================== --- storm/branches/bobby-versioned-site/releases/0.10.0/Maven.md (original) +++ storm/branches/bobby-versioned-site/releases/0.10.0/Maven.md Thu Mar 17 22:05:29 2016 @@ -1,5 +1,7 @@ --- +title: Maven layout: documentation +documentation: true version: v0.10.0 raw-version: 0.10.0 --- Modified: storm/branches/bobby-versioned-site/releases/0.10.0/Message-passing-implementation.md URL: http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.10.0/Message-passing-implementation.md?rev=1735510&r1=1735509&r2=1735510&view=diff ============================================================================== --- storm/branches/bobby-versioned-site/releases/0.10.0/Message-passing-implementation.md (original) +++ storm/branches/bobby-versioned-site/releases/0.10.0/Message-passing-implementation.md Thu Mar 17 22:05:29 2016 @@ -9,23 +9,23 @@ version: v0.10.0 This page walks through how emitting and transferring tuples works in Storm. - Worker is responsible for message transfer - - `refresh-connections` is called every "task.refresh.poll.secs" or whenever assignment in ZK changes. It manages connections to other workers and maintains a mapping from task -> worker [code](https://github.com/apache/incubator-storm/blob/0.7.1/src/clj/backtype/storm/daemon/worker.clj#L123) - - Provides a "transfer function" that is used by tasks to send tuples to other tasks. The transfer function takes in a task id and a tuple, and it serializes the tuple and puts it onto a "transfer queue". There is a single transfer queue for each worker. [code](https://github.com/apache/incubator-storm/blob/0.7.1/src/clj/backtype/storm/daemon/worker.clj#L56) - - The serializer is thread-safe [code](https://github.com/apache/incubator-storm/blob/0.7.1/src/jvm/backtype/storm/serialization/KryoTupleSerializer.java#L26) - - The worker has a single thread which drains the transfer queue and sends the messages to other workers [code](https://github.com/apache/incubator-storm/blob/0.7.1/src/clj/backtype/storm/daemon/worker.clj#L185) - - Message sending happens through this protocol: [code](https://github.com/apache/incubator-storm/blob/0.7.1/src/clj/backtype/storm/messaging/protocol.clj) - - The implementation for distributed mode uses ZeroMQ [code](https://github.com/apache/incubator-storm/blob/0.7.1/src/clj/backtype/storm/messaging/zmq.clj) - - The implementation for local mode uses in memory Java queues (so that it's easy to use Storm locally without needing to get ZeroMQ installed) [code](https://github.com/apache/incubator-storm/blob/0.7.1/src/clj/backtype/storm/messaging/local.clj) + - `refresh-connections` is called every "task.refresh.poll.secs" or whenever assignment in ZK changes. It manages connections to other workers and maintains a mapping from task -> worker [code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/worker.clj#L123) + - Provides a "transfer function" that is used by tasks to send tuples to other tasks. The transfer function takes in a task id and a tuple, and it serializes the tuple and puts it onto a "transfer queue". There is a single transfer queue for each worker. [code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/worker.clj#L56) + - The serializer is thread-safe [code](https://github.com/apache/storm/blob/0.7.1/src/jvm/backtype/storm/serialization/KryoTupleSerializer.java#L26) + - The worker has a single thread which drains the transfer queue and sends the messages to other workers [code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/worker.clj#L185) + - Message sending happens through this protocol: [code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/messaging/protocol.clj) + - The implementation for distributed mode uses ZeroMQ [code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/messaging/zmq.clj) + - The implementation for local mode uses in memory Java queues (so that it's easy to use Storm locally without needing to get ZeroMQ installed) [code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/messaging/local.clj) - Receiving messages in tasks works differently in local mode and distributed mode - - In local mode, the tuple is sent directly to an in-memory queue for the receiving task [code](https://github.com/apache/incubator-storm/blob/master/src/clj/backtype/storm/messaging/local.clj#L21) - - In distributed mode, each worker listens on a single TCP port for incoming messages and then routes those messages in-memory to tasks. The TCP port is called a "virtual port", because it receives [task id, message] and then routes it to the actual task. [code](https://github.com/apache/incubator-storm/blob/master/src/clj/backtype/storm/daemon/worker.clj#L204) - - The virtual port implementation is here: [code](https://github.com/apache/incubator-storm/blob/master/src/clj/zilch/virtual_port.clj) - - Tasks listen on an in-memory ZeroMQ port for messages from the virtual port [code](https://github.com/apache/incubator-storm/blob/master/src/clj/backtype/storm/daemon/task.clj#L201) - - Bolts listen here: [code](https://github.com/apache/incubator-storm/blob/master/src/clj/backtype/storm/daemon/task.clj#L489) - - Spouts listen here: [code](https://github.com/apache/incubator-storm/blob/master/src/clj/backtype/storm/daemon/task.clj#L382) + - In local mode, the tuple is sent directly to an in-memory queue for the receiving task [code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/messaging/local.clj#L21) + - In distributed mode, each worker listens on a single TCP port for incoming messages and then routes those messages in-memory to tasks. The TCP port is called a "virtual port", because it receives [task id, message] and then routes it to the actual task. [code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/worker.clj#L204) + - The virtual port implementation is here: [code](https://github.com/apache/storm/blob/0.7.1/src/clj/zilch/virtual_port.clj) + - Tasks listen on an in-memory ZeroMQ port for messages from the virtual port [code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/task.clj#L201) + - Bolts listen here: [code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/task.clj#L489) + - Spouts listen here: [code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/task.clj#L382) - Tasks are responsible for message routing. A tuple is emitted either to a direct stream (where the task id is specified) or a regular stream. In direct streams, the message is only sent if that bolt subscribes to that direct stream. In regular streams, the stream grouping functions are used to determine the task ids to send the tuple to. - - Tasks have a routing map from {stream id} -> {component id} -> {stream grouping function} [code](https://github.com/apache/incubator-storm/blob/master/src/clj/backtype/storm/daemon/task.clj#L198) - - The "tasks-fn" returns the task ids to send the tuples to for either regular stream emit or direct stream emit [code](https://github.com/apache/incubator-storm/blob/master/src/clj/backtype/storm/daemon/task.clj#L207) + - Tasks have a routing map from {stream id} -> {component id} -> {stream grouping function} [code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/task.clj#L198) + - The "tasks-fn" returns the task ids to send the tuples to for either regular stream emit or direct stream emit [code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/task.clj#L207) - After getting the output task ids, bolts and spouts use the transfer-fn provided by the worker to actually transfer the tuples - - Bolt transfer code here: [code](https://github.com/apache/incubator-storm/blob/master/src/clj/backtype/storm/daemon/task.clj#L429) - - Spout transfer code here: [code](https://github.com/apache/incubator-storm/blob/master/src/clj/backtype/storm/daemon/task.clj#L329) + - Bolt transfer code here: [code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/task.clj#L429) + - Spout transfer code here: [code](https://github.com/apache/storm/blob/0.7.1/src/clj/backtype/storm/daemon/task.clj#L329) Modified: storm/branches/bobby-versioned-site/releases/0.10.0/Running-topologies-on-a-production-cluster.md URL: http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.10.0/Running-topologies-on-a-production-cluster.md?rev=1735510&r1=1735509&r2=1735510&view=diff ============================================================================== --- storm/branches/bobby-versioned-site/releases/0.10.0/Running-topologies-on-a-production-cluster.md (original) +++ storm/branches/bobby-versioned-site/releases/0.10.0/Running-topologies-on-a-production-cluster.md Thu Mar 17 22:05:29 2016 @@ -51,7 +51,7 @@ You can find out how to configure your ` There are a variety of configurations you can set per topology. A list of all the configurations you can set can be found [here](javadocs/backtype/storm/Config.html). The ones prefixed with "TOPOLOGY" can be overridden on a topology-specific basis (the other ones are cluster configurations and cannot be overridden). Here are some common ones that are set for a topology: 1. **Config.TOPOLOGY_WORKERS**: This sets the number of worker processes to use to execute the topology. For example, if you set this to 25, there will be 25 Java processes across the cluster executing all the tasks. If you had a combined 150 parallelism across all components in the topology, each worker process will have 6 tasks running within it as threads. -2. **Config.TOPOLOGY_ACKERS**: This sets the number of tasks that will track tuple trees and detect when a spout tuple has been fully processed. Ackers are an integral part of Storm's reliability model and you can read more about them on [Guaranteeing message processing](Guaranteeing-message-processing.html). +2. **Config.TOPOLOGY_ACKER_EXECUTORS**: This sets the number of executors that will track tuple trees and detect when a spout tuple has been fully processed. Ackers are an integral part of Storm's reliability model and you can read more about them on [Guaranteeing message processing](Guaranteeing-message-processing.html). By not setting this variable or setting it as null, Storm will set the number of acker executors to be equal to the number of workers configured for this topology. If this variable is set to 0, then Storm will immediately ack tuples as soon as they come off the spout, effectively disabling reliability. 3. **Config.TOPOLOGY_MAX_SPOUT_PENDING**: This sets the maximum number of spout tuples that can be pending on a single spout task at once (pending means the tuple has not been acked or failed yet). It is highly recommended you set this config to prevent queue explosion. 4. **Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS**: This is the maximum amount of time a spout tuple has to be fully completed before it is considered failed. This value defaults to 30 seconds, which is sufficient for most topologies. See [Guaranteeing message processing](Guaranteeing-message-processing.html) for more information on how Storm's reliability model works. 5. **Config.TOPOLOGY_SERIALIZATIONS**: You can register more serializers to Storm using this config so that you can use custom types within tuples. Modified: storm/branches/bobby-versioned-site/releases/0.10.0/Serialization.md URL: http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.10.0/Serialization.md?rev=1735510&r1=1735509&r2=1735510&view=diff ============================================================================== --- storm/branches/bobby-versioned-site/releases/0.10.0/Serialization.md (original) +++ storm/branches/bobby-versioned-site/releases/0.10.0/Serialization.md Thu Mar 17 22:05:29 2016 @@ -8,7 +8,7 @@ This page is about how the serialization Tuples can be comprised of objects of any types. Since Storm is a distributed system, it needs to know how to serialize and deserialize objects when they're passed between tasks. -Storm uses [Kryo](http://code.google.com/p/kryo/) for serialization. Kryo is a flexible and fast serialization library that produces small serializations. +Storm uses [Kryo](https://github.com/EsotericSoftware/kryo) for serialization. Kryo is a flexible and fast serialization library that produces small serializations. By default, Storm can serialize primitive types, strings, byte arrays, ArrayList, HashMap, HashSet, and the Clojure collection types. If you want to use another type in your tuples, you'll need to register a custom serializer. @@ -24,12 +24,12 @@ Finally, another reason for using dynami ### Custom serialization -As mentioned, Storm uses Kryo for serialization. To implement custom serializers, you need to register new serializers with Kryo. It's highly recommended that you read over [Kryo's home page](http://code.google.com/p/kryo/) to understand how it handles custom serialization. +As mentioned, Storm uses Kryo for serialization. To implement custom serializers, you need to register new serializers with Kryo. It's highly recommended that you read over [Kryo's home page](https://github.com/EsotericSoftware/kryo) to understand how it handles custom serialization. Adding custom serializers is done through the "topology.kryo.register" property in your topology config. It takes a list of registrations, where each registration can take one of two forms: 1. The name of a class to register. In this case, Storm will use Kryo's `FieldsSerializer` to serialize the class. This may or may not be optimal for the class -- see the Kryo docs for more details. -2. A map from the name of a class to register to an implementation of [com.esotericsoftware.kryo.Serializer](http://code.google.com/p/kryo/source/browse/trunk/src/com/esotericsoftware/kryo/Serializer.java). +2. A map from the name of a class to register to an implementation of [com.esotericsoftware.kryo.Serializer](https://github.com/EsotericSoftware/kryo/blob/master/src/com/esotericsoftware/kryo/Serializer.java). Let's look at an example. Modified: storm/branches/bobby-versioned-site/releases/0.10.0/Setting-up-a-Storm-cluster.md URL: http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.10.0/Setting-up-a-Storm-cluster.md?rev=1735510&r1=1735509&r2=1735510&view=diff ============================================================================== --- storm/branches/bobby-versioned-site/releases/0.10.0/Setting-up-a-Storm-cluster.md (original) +++ storm/branches/bobby-versioned-site/releases/0.10.0/Setting-up-a-Storm-cluster.md Thu Mar 17 22:05:29 2016 @@ -65,11 +65,13 @@ storm.local.dir: "C:\\storm-local" ``` If you use a relative path,it will be relative to where you installed storm(STORM_HOME). You can leave it empty with default value `$STORM_HOME/storm-local` -3) **nimbus.host**: The worker nodes need to know which machine is the master in order to download topology jars and confs. For example: + +3) **nimbus.seeds**: The worker nodes need to know which machines are the candidate of master in order to download topology jars and confs. For example: ```yaml -nimbus.host: "111.222.333.44" +nimbus.seeds: ["111.222.333.44"] ``` +You're encouraged to fill out the value to list of **machine's FQDN**. If you want to set up Nimbus H/A, you have to address all machines' FQDN which run nimbus. You may want to leave it to default value when you just want to set up 'pseudo-distributed' cluster, but you're still encouraged to fill out FQDN. 4) **supervisor.slots.ports**: For each worker machine, you configure how many workers run on that machine with this config. Each worker uses a single port for receiving messages, and this setting defines which ports are open for use. If you define five ports here, then Storm will allocate up to five workers to run on this machine. If you define three ports, Storm will only run up to three. By default, this setting is configured to run 4 workers on the ports 6700, 6701, 6702, and 6703. For example: @@ -81,6 +83,25 @@ supervisor.slots.ports: - 6703 ``` +### Monitoring Health of Supervisors + +Storm provides a mechanism by which administrators can configure the supervisor to run administrator supplied scripts periodically to determine if a node is healthy or not. Administrators can have the supervisor determine if the node is in a healthy state by performing any checks of their choice in scripts located in storm.health.check.dir. If a script detects the node to be in an unhealthy state, it must print a line to standard output beginning with the string ERROR. The supervisor will periodically run the scripts in the health check dir and check the output. If the scriptâs output contains the string ERROR, as described above, the supervisor will shut down any workers and exit. + +If the supervisor is running with supervision "/bin/storm node-health-check" can be called to determine if the supervisor should be launched or if the node is unhealthy. + +The health check directory location can be configured with: + +```yaml +storm.health.check.dir: "healthchecks" + +``` +The scripts must have execute permissions. +The time to allow any given healthcheck script to run before it is marked failed due to timeout can be configured with: + +```yaml +storm.health.check.timeout.ms: 5000 +``` + ### Configure external libraries and environmental variables (optional) If you need support from external libraries or custom plugins, you can place such jars into the extlib/ and extlib-daemon/ directories. Note that the extlib-daemon/ directory stores jars used only by daemons (Nimbus, Supervisor, DRPC, UI, Logviewer), e.g., HDFS and customized scheduling libraries. Accordingly, two environmental variables STORM_EXT_CLASSPATH and STORM_EXT_CLASSPATH_DAEMON can be configured by users for including the external classpath and daemon-only external classpath. @@ -92,6 +113,6 @@ The last step is to launch all the Storm 1. **Nimbus**: Run the command "bin/storm nimbus" under supervision on the master machine. 2. **Supervisor**: Run the command "bin/storm supervisor" under supervision on each worker machine. The supervisor daemon is responsible for starting and stopping worker processes on that machine. -3. **UI**: Run the Storm UI (a site you can access from the browser that gives diagnostics on the cluster and topologies) by running the command "bin/storm ui" under supervision. The UI can be accessed by navigating your web browser to http://{nimbus host}:8080. +3. **UI**: Run the Storm UI (a site you can access from the browser that gives diagnostics on the cluster and topologies) by running the command "bin/storm ui" under supervision. The UI can be accessed by navigating your web browser to http://{ui host}:8080. As you can see, running the daemons is very straightforward. The daemons will log to the logs/ directory in wherever you extracted the Storm release. Modified: storm/branches/bobby-versioned-site/releases/0.10.0/Setting-up-development-environment.md URL: http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.10.0/Setting-up-development-environment.md?rev=1735510&r1=1735509&r2=1735510&view=diff ============================================================================== --- storm/branches/bobby-versioned-site/releases/0.10.0/Setting-up-development-environment.md (original) +++ storm/branches/bobby-versioned-site/releases/0.10.0/Setting-up-development-environment.md Thu Mar 17 22:05:29 2016 @@ -30,13 +30,5 @@ Installing a Storm release locally is on The previous step installed the `storm` client on your machine which is used to communicate with remote Storm clusters. Now all you have to do is tell the client which Storm cluster to talk to. To do this, all you have to do is put the host address of the master in the `~/.storm/storm.yaml` file. It should look something like this: ``` -nimbus.host: "123.45.678.890" +nimbus.seeds: ["123.45.678.890"] ``` - -Alternatively, if you use the [storm-deploy](https://github.com/nathanmarz/storm-deploy) project to provision Storm clusters on AWS, it will automatically set up your ~/.storm/storm.yaml file. You can manually attach to a Storm cluster (or switch between multiple clusters) using the "attach" command, like so: - -``` -lein run :deploy --attach --name mystormcluster -``` - -More information is on the storm-deploy [wiki](https://github.com/nathanmarz/storm-deploy/wiki) Modified: storm/branches/bobby-versioned-site/releases/0.10.0/Trident-API-Overview.md URL: http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.10.0/Trident-API-Overview.md?rev=1735510&r1=1735509&r2=1735510&view=diff ============================================================================== --- storm/branches/bobby-versioned-site/releases/0.10.0/Trident-API-Overview.md (original) +++ storm/branches/bobby-versioned-site/releases/0.10.0/Trident-API-Overview.md Thu Mar 17 22:05:29 2016 @@ -78,15 +78,228 @@ Now suppose you had these tuples with fi If you ran this code: ```java -mystream.each(new Fields("b", "a"), new MyFilter()) +mystream.filter(new MyFilter()) ``` The resulting tuples would be: ``` -[2, 1, 1] +[1, 2, 3] ``` +### map and flatMap + +`map` returns a stream consisting of the result of applying the given mapping function to the tuples of the stream. This +can be used to apply a one-one transformation to the tuples. + +For example, if there is a stream of words and you wanted to convert it to a stream of upper case words, +you could define a mapping function as follows, + +```java +public class UpperCase extends MapFunction { + @Override + public Values execute(TridentTuple input) { + return new Values(input.getString(0).toUpperCase()); + } +} +``` + +The mapping function can then be applied on the stream to produce a stream of uppercase words. + +```java +mystream.map(new UpperCase()) +``` + +`flatMap` is similar to `map` but has the effect of applying a one-to-many transformation to the values of the stream, +and then flattening the resulting elements into a new stream. + +For example, if there is a stream of sentences and you wanted to convert it to a stream of words, +you could define a flatMap function as follows, + +```java +public class Split extends FlatMapFunction { + @Override + public Iterable<Values> execute(TridentTuple input) { + List<Values> valuesList = new ArrayList<>(); + for (String word : input.getString(0).split(" ")) { + valuesList.add(new Values(word)); + } + return valuesList; + } +} +``` + +The flatMap function can then be applied on the stream of sentences to produce a stream of words, + +```java +mystream.flatMap(new Split()) +``` + +Of course these operations can be chained, so a stream of uppercase words can be obtained from a stream of sentences as follows, + +```java +mystream.flatMap(new Split()).map(new UpperCase()) +``` +### peek +`peek` can be used to perform an additional action on each trident tuple as they flow through the stream. + This could be useful for debugging to see the tuples as they flow past a certain point in a pipeline. + +For example, the below code would print the result of converting the words to uppercase before they are passed to `groupBy` +```java + mystream.flatMap(new Split()).map(new UpperCase()) + .peek(new Consumer() { + @Override + public void accept(TridentTuple input) { + System.out.println(input.getString(0)); + } + }) + .groupBy(new Fields("word")) + .persistentAggregate(new MemoryMapState.Factory(), new Count(), new Fields("count")) +``` + +### min and minBy +`min` and `minBy` operations return minimum value on each partition of a batch of tuples in a trident stream. + +Suppose, a trident stream contains fields ["device-id", "count"] and the following partitions of tuples + +``` +Partition 0: +[123, 2] +[113, 54] +[23, 28] +[237, 37] +[12, 23] +[62, 17] +[98, 42] + +Partition 1: +[64, 18] +[72, 54] +[2, 28] +[742, 71] +[98, 45] +[62, 12] +[19, 174] + + +Partition 2: +[27, 94] +[82, 23] +[9, 86] +[53, 71] +[74, 37] +[51, 49] +[37, 98] + +``` + +`minBy` operation can be applied on the above stream of tuples like below which results in emitting tuples with minimum values of `count` field in each partition. + +``` java + mystream.minBy(new Fields("count")) +``` +Result of the above code on mentioned partitions is: + +``` +Partition 0: +[123, 2] + + +Partition 1: +[62, 12] + + +Partition 2: +[82, 23] + +``` + +You can look at other `min` and `minBy` operations on Stream +``` java + public <T> Stream minBy(String inputFieldName, Comparator<T> comparator) + public Stream min(Comparator<TridentTuple> comparator) +``` +Below example shows how these APIs can be used to find minimum using respective Comparators on a tuple. + +``` java + + FixedBatchSpout spout = new FixedBatchSpout(allFields, 10, Vehicle.generateVehicles(20)); + + TridentTopology topology = new TridentTopology(); + Stream vehiclesStream = topology.newStream("spout1", spout). + each(allFields, new Debug("##### vehicles")); + + Stream slowVehiclesStream = + vehiclesStream + .min(new SpeedComparator()) // Comparator w.r.t speed on received tuple. + .each(vehicleField, new Debug("#### slowest vehicle")); + + vehiclesStream + .minBy(Vehicle.FIELD_NAME, new EfficiencyComparator()) // Comparator w.r.t efficiency on received tuple. + .each(vehicleField, new Debug("#### least efficient vehicle")); + +``` +Example applications of these APIs can be located at [TridentMinMaxOfDevicesTopology](https://github.com/apache/storm/blob/master/examples/storm-starter/src/jvm/org/apache/storm/starter/trident/TridentMinMaxOfDevicesTopology.java) and [TridentMinMaxOfVehiclesTopology](https://github.com/apache/storm/blob/master/examples/storm-starter/src/jvm/org/apache/storm/starter/trident/TridentMinMaxOfVehiclesTopology.java) + +### max and maxBy +`max` and `maxBy` operations return maximum value on each partition of a batch of tuples in a trident stream. + +Suppose, a trident stream contains fields ["device-id", "count"] as mentioned in the above section. + +`max` and `maxBy` operations can be applied on the above stream of tuples like below which results in emitting tuples with maximum values of `count` field for each partition. + +``` java + mystream.maxBy(new Fields("count")) +``` +Result of the above code on mentioned partitions is: + +``` +Partition 0: +[113, 54] + + +Partition 1: +[19, 174] + + +Partition 2: +[37, 98] + +``` + +You can look at other `max` and `maxBy` functions on Stream + +``` java + + public <T> Stream maxBy(String inputFieldName, Comparator<T> comparator) + public Stream max(Comparator<TridentTuple> comparator) + +``` + +Below example shows how these APIs can be used to find maximum using respective Comparators on a tuple. + +``` java + + FixedBatchSpout spout = new FixedBatchSpout(allFields, 10, Vehicle.generateVehicles(20)); + + TridentTopology topology = new TridentTopology(); + Stream vehiclesStream = topology.newStream("spout1", spout). + each(allFields, new Debug("##### vehicles")); + + vehiclesStream + .max(new SpeedComparator()) // Comparator w.r.t speed on received tuple. + .each(vehicleField, new Debug("#### fastest vehicle")) + .project(driverField) + .each(driverField, new Debug("##### fastest driver")); + + vehiclesStream + .maxBy(Vehicle.FIELD_NAME, new EfficiencyComparator()) // Comparator w.r.t efficiency on received tuple. + .each(vehicleField, new Debug("#### most efficient vehicle")); + +``` + +Example applications of these APIs can be located at [TridentMinMaxOfDevicesTopology](https://github.com/apache/storm/blob/master/examples/storm-starter/src/jvm/org/apache/storm/starter/trident/TridentMinMaxOfDevicesTopology.java) and [TridentMinMaxOfVehiclesTopology](https://github.com/apache/storm/blob/master/examples/storm-starter/src/jvm/org/apache/storm/starter/trident/TridentMinMaxOfVehiclesTopology.java) + ### partitionAggregate partitionAggregate runs a function on each partition of a batch of tuples. Unlike functions, the tuples emitted by partitionAggregate replace the input tuples given to it. Consider this example: Modified: storm/branches/bobby-versioned-site/releases/0.10.0/Trident-state.md URL: http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.10.0/Trident-state.md?rev=1735510&r1=1735509&r2=1735510&view=diff ============================================================================== --- storm/branches/bobby-versioned-site/releases/0.10.0/Trident-state.md (original) +++ storm/branches/bobby-versioned-site/releases/0.10.0/Trident-state.md Thu Mar 17 22:05:29 2016 @@ -329,4 +329,4 @@ Trident also provides the [CachedMap](ht Finally, Trident provides the [SnapshottableMap](https://github.com/apache/storm/blob/{{page.version}}/storm-core/src/jvm/storm/trident/state/map/SnapshottableMap.java) class that turns a MapState into a Snapshottable object, by storing global aggregations into a fixed key. -Take a look at the implementation of [MemcachedState](https://github.com/nathanmarz/trident-memcached/blob/{{page.version}}/src/jvm/trident/memcached/MemcachedState.java) to see how all these utilities can be put together to make a high performance MapState implementation. MemcachedState allows you to choose between opaque transactional, transactional, and non-transactional semantics. +Take a look at the implementation of [MemcachedState](https://github.com/nathanmarz/trident-memcached/blob/master/src/jvm/trident/memcached/MemcachedState.java) to see how all these utilities can be put together to make a high performance MapState implementation. MemcachedState allows you to choose between opaque transactional, transactional, and non-transactional semantics. Modified: storm/branches/bobby-versioned-site/releases/0.10.0/Trident-tutorial.md URL: http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.10.0/Trident-tutorial.md?rev=1735510&r1=1735509&r2=1735510&view=diff ============================================================================== --- storm/branches/bobby-versioned-site/releases/0.10.0/Trident-tutorial.md (original) +++ storm/branches/bobby-versioned-site/releases/0.10.0/Trident-tutorial.md Thu Mar 17 22:05:29 2016 @@ -236,7 +236,7 @@ Trident solves this problem by doing two With these two primitives, you can achieve exactly-once semantics with your state updates. Rather than store just the count in the database, what you can do instead is store the transaction id with the count in the database as an atomic value. Then, when updating the count, you can just compare the transaction id in the database with the transaction id for the current batch. If they're the same, you skip the update â because of the strong ordering, you know for sure that the value in the database incorporates the current batch. If they're different, you increment the count. -Of course, you don't have to do this logic manually in your topologies. This logic is wrapped by the State abstraction and done automatically. Nor is your State object required to implement the transaction id trick: if you don't want to pay the cost of storing the transaction id in the database, you don't have to. In that case the State will have at-least-once-processing semantics in the case of failures (which may be fine for your application). You can read more about how to implement a State and the various fault-tolerance tradeoffs possible [in this doc](/documentation/Trident-state). +Of course, you don't have to do this logic manually in your topologies. This logic is wrapped by the State abstraction and done automatically. Nor is your State object required to implement the transaction id trick: if you don't want to pay the cost of storing the transaction id in the database, you don't have to. In that case the State will have at-least-once-processing semantics in the case of failures (which may be fine for your application). You can read more about how to implement a State and the various fault-tolerance tradeoffs possible [in this doc](/documentation/Trident-state.html). A State is allowed to use whatever strategy it wants to store state. So it could store state in an external database or it could keep the state in-memory but backed by HDFS (like how HBase works). State's are not required to hold onto state forever. For example, you could have an in-memory State implementation that only keeps the last X hours of data available and drops anything older. Take a look at the implementation of the [Memcached integration](https://github.com/nathanmarz/trident-memcached/blob/master/src/jvm/trident/memcached/MemcachedState.java) for an example State implementation. Modified: storm/branches/bobby-versioned-site/releases/0.10.0/Troubleshooting.md URL: http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.10.0/Troubleshooting.md?rev=1735510&r1=1735509&r2=1735510&view=diff ============================================================================== --- storm/branches/bobby-versioned-site/releases/0.10.0/Troubleshooting.md (original) +++ storm/branches/bobby-versioned-site/releases/0.10.0/Troubleshooting.md Thu Mar 17 22:05:29 2016 @@ -141,6 +141,43 @@ Caused by: java.lang.NullPointerExceptio ... 6 more ``` +or + +``` +java.lang.RuntimeException: java.lang.NullPointerException + at +backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:128) +~[storm-core-0.9.3.jar:0.9.3] + at +backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99) +~[storm-core-0.9.3.jar:0.9.3] + at +backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80) +~[storm-core-0.9.3.jar:0.9.3] + at +backtype.storm.disruptor$consume_loop_STAR_$fn__759.invoke(disruptor.clj:94) +~[storm-core-0.9.3.jar:0.9.3] + at backtype.storm.util$async_loop$fn__458.invoke(util.clj:463) +~[storm-core-0.9.3.jar:0.9.3] + at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na] + at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65] +Caused by: java.lang.NullPointerException: null + at clojure.lang.RT.intCast(RT.java:1087) ~[clojure-1.5.1.jar:na] + at +backtype.storm.daemon.worker$mk_transfer_fn$fn__3548.invoke(worker.clj:129) +~[storm-core-0.9.3.jar:0.9.3] + at +backtype.storm.daemon.executor$start_batch_transfer__GT_worker_handler_BANG_$fn__3282.invoke(executor.clj:258) +~[storm-core-0.9.3.jar:0.9.3] + at +backtype.storm.disruptor$clojure_handler$reify__746.onEvent(disruptor.clj:58) +~[storm-core-0.9.3.jar:0.9.3] + at +backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125) +~[storm-core-0.9.3.jar:0.9.3] + ... 6 common frames omitted +``` + Solution: * This is caused by having multiple threads issue methods on the `OutputCollector`. All emits, acks, and fails must happen on the same thread. One subtle way this can happen is if you make a `IBasicBolt` that emits on a separate thread. `IBasicBolt`'s automatically ack after execute is called, so this would cause multiple threads to use the `OutputCollector` leading to this exception. When using a basic bolt, all emits must happen in the same thread that runs `execute`. Modified: storm/branches/bobby-versioned-site/releases/0.10.0/Tutorial.md URL: http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.10.0/Tutorial.md?rev=1735510&r1=1735509&r2=1735510&view=diff ============================================================================== --- storm/branches/bobby-versioned-site/releases/0.10.0/Tutorial.md (original) +++ storm/branches/bobby-versioned-site/releases/0.10.0/Tutorial.md Thu Mar 17 22:05:29 2016 @@ -140,23 +140,28 @@ As you can see, the implementation is ve public static class ExclamationBolt implements IRichBolt { OutputCollector _collector; + @Override public void prepare(Map conf, TopologyContext context, OutputCollector collector) { _collector = collector; } + @Override public void execute(Tuple tuple) { _collector.emit(tuple, new Values(tuple.getString(0) + "!!!")); _collector.ack(tuple); } + @Override public void cleanup() { } + @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("word")); } - public Map getComponentConfiguration() { + @Override + public Map<String, Object> getComponentConfiguration() { return null; } } @@ -164,9 +169,9 @@ public static class ExclamationBolt impl The `prepare` method provides the bolt with an `OutputCollector` that is used for emitting tuples from this bolt. Tuples can be emitted at anytime from the bolt -- in the `prepare`, `execute`, or `cleanup` methods, or even asynchronously in another thread. This `prepare` implementation simply saves the `OutputCollector` as an instance variable to be used later on in the `execute` method. -The `execute` method receives a tuple from one of the bolt's inputs. The `ExclamationBolt` grabs the first field from the tuple and emits a new tuple with the string "!!!" appended to it. If you implement a bolt that subscribes to multiple input sources, you can find out which component the [Tuple](javadocs/backtype/storm/tuple/Tuple.html) came from by using the `Tuple#getSourceComponent` method. +The `execute` method receives a tuple from one of the bolt's inputs. The `ExclamationBolt` grabs the first field from the tuple and emits a new tuple with the string "!!!" appended to it. If you implement a bolt that subscribes to multiple input sources, you can find out which component the [Tuple](/javadoc/apidocs/backtype/storm/tuple/Tuple.html) came from by using the `Tuple#getSourceComponent` method. -There's a few other things going in in the `execute` method, namely that the input tuple is passed as the first argument to `emit` and the input tuple is acked on the final line. These are part of Storm's reliability API for guaranteeing no data loss and will be explained later in this tutorial. +There's a few other things going on in the `execute` method, namely that the input tuple is passed as the first argument to `emit` and the input tuple is acked on the final line. These are part of Storm's reliability API for guaranteeing no data loss and will be explained later in this tutorial. The `cleanup` method is called when a Bolt is being shutdown and should cleanup any resources that were opened. There's no guarantee that this method will be called on the cluster: for example, if the machine the task is running on blows up, there's no way to invoke the method. The `cleanup` method is intended for when you run topologies in [local mode](Local-mode.html) (where a Storm cluster is simulated in process), and you want to be able to run and kill many topologies without suffering any resource leaks. @@ -180,15 +185,18 @@ Methods like `cleanup` and `getComponent public static class ExclamationBolt extends BaseRichBolt { OutputCollector _collector; + @Override public void prepare(Map conf, TopologyContext context, OutputCollector collector) { _collector = collector; } + @Override public void execute(Tuple tuple) { _collector.emit(tuple, new Values(tuple.getString(0) + "!!!")); _collector.ack(tuple); } + @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("word")); } Modified: storm/branches/bobby-versioned-site/releases/0.10.0/Understanding-the-parallelism-of-a-Storm-topology.md URL: http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.10.0/Understanding-the-parallelism-of-a-Storm-topology.md?rev=1735510&r1=1735509&r2=1735510&view=diff ============================================================================== --- storm/branches/bobby-versioned-site/releases/0.10.0/Understanding-the-parallelism-of-a-Storm-topology.md (original) +++ storm/branches/bobby-versioned-site/releases/0.10.0/Understanding-the-parallelism-of-a-Storm-topology.md Thu Mar 17 22:05:29 2016 @@ -38,7 +38,7 @@ The following sections give an overview ### Number of executors (threads) * Description: How many executors to spawn _per component_. -* Configuration option: ? +* Configuration option: None (pass ``parallelism_hint`` parameter to ``setSpout`` or ``setBolt``) * How to set in your code (examples): * [TopologyBuilder#setSpout()](javadocs/backtype/storm/topology/TopologyBuilder.html) * [TopologyBuilder#setBolt()](javadocs/backtype/storm/topology/TopologyBuilder.html) @@ -57,7 +57,7 @@ Here is an example code snippet to show ```java topologyBuilder.setBolt("green-bolt", new GreenBolt(), 2) .setNumTasks(4) - .shuffleGrouping("blue-spout); + .shuffleGrouping("blue-spout"); ``` In the above code we configured Storm to run the bolt ``GreenBolt`` with an initial number of two executors and four associated tasks. Storm will run two tasks per executor (thread). If you do not explicitly configure the number of tasks, Storm will run by default one task per executor. Modified: storm/branches/bobby-versioned-site/releases/0.10.0/Using-non-JVM-languages-with-Storm.md URL: http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.10.0/Using-non-JVM-languages-with-Storm.md?rev=1735510&r1=1735509&r2=1735510&view=diff ============================================================================== --- storm/branches/bobby-versioned-site/releases/0.10.0/Using-non-JVM-languages-with-Storm.md (original) +++ storm/branches/bobby-versioned-site/releases/0.10.0/Using-non-JVM-languages-with-Storm.md Thu Mar 17 22:05:29 2016 @@ -1,4 +1,5 @@ --- +title: Using non JVM languages with Storm layout: documentation version: v0.10.0 --- Modified: storm/branches/bobby-versioned-site/releases/0.10.0/images/topology.png URL: http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.10.0/images/topology.png?rev=1735510&r1=1735509&r2=1735510&view=diff ============================================================================== Binary files - no diff available.