Repository: storm Updated Branches: refs/heads/master a558a1f07 -> 946042124
STORM-1749: Fix storm-starter github links Project: http://git-wip-us.apache.org/repos/asf/storm/repo Commit: http://git-wip-us.apache.org/repos/asf/storm/commit/0458b007 Tree: http://git-wip-us.apache.org/repos/asf/storm/tree/0458b007 Diff: http://git-wip-us.apache.org/repos/asf/storm/diff/0458b007 Branch: refs/heads/master Commit: 0458b0078dcaaeb1d7290e8c77c31cd46ac0f1e2 Parents: a536937 Author: Abhishek Agarwal <[email protected]> Authored: Thu May 5 23:41:40 2016 +0530 Committer: Abhishek Agarwal <[email protected]> Committed: Thu May 5 23:41:40 2016 +0530 ---------------------------------------------------------------------- docs/Clojure-DSL.md | 4 ++-- docs/Common-patterns.md | 4 ++-- docs/Distributed-RPC.md | 2 +- docs/Transactional-topologies.md | 8 ++++---- docs/Trident-state.md | 4 ++-- docs/Tutorial.md | 2 +- examples/storm-starter/README.markdown | 12 ++++++------ 7 files changed, 18 insertions(+), 18 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/storm/blob/0458b007/docs/Clojure-DSL.md ---------------------------------------------------------------------- diff --git a/docs/Clojure-DSL.md b/docs/Clojure-DSL.md index 369695a..1aa3393 100644 --- a/docs/Clojure-DSL.md +++ b/docs/Clojure-DSL.md @@ -17,7 +17,7 @@ This page outlines all the pieces of the Clojure DSL, including: To define a topology, use the `topology` function. `topology` takes in two arguments: a map of "spout specs" and a map of "bolt specs". Each spout and bolt spec wires the code for the component into the topology by specifying things like inputs and parallelism. -Let's take a look at an example topology definition [from the storm-starter project]({{page.git-blob-base}}/examples/storm-starter/src/clj/storm/starter/clj/word_count.clj): +Let's take a look at an example topology definition [from the storm-starter project]({{page.git-blob-base}}/examples/storm-starter/src/clj/org/apache/storm/starter/clj/word_count.clj): ```clojure (topology @@ -203,7 +203,7 @@ The signature for `defspout` looks like the following: If you leave out the option map, it defaults to {:prepare true}. The output declaration for `defspout` has the same syntax as `defbolt`. -Here's an example `defspout` implementation from [storm-starter]({{page.git-blob-base}}/examples/storm-starter/src/clj/storm/starter/clj/word_count.clj): +Here's an example `defspout` implementation from [storm-starter]({{page.git-blob-base}}/examples/storm-starter/src/clj/org/apache/storm/starter/clj/word_count.clj): ```clojure (defspout sentence-spout ["sentence"] http://git-wip-us.apache.org/repos/asf/storm/blob/0458b007/docs/Common-patterns.md ---------------------------------------------------------------------- diff --git a/docs/Common-patterns.md b/docs/Common-patterns.md index 9f5ffe7..b98fe90 100644 --- a/docs/Common-patterns.md +++ b/docs/Common-patterns.md @@ -70,7 +70,7 @@ builder.setBolt("merge", new MergeObjects()) .globalGrouping("rank"); ``` -This pattern works because of the fields grouping done by the first bolt which gives the partitioning you need for this to be semantically correct. You can see an example of this pattern in storm-starter [here]({{page.git-blob-base}}/examples/storm-starter/src/jvm/storm/starter/RollingTopWords.java). +This pattern works because of the fields grouping done by the first bolt which gives the partitioning you need for this to be semantically correct. You can see an example of this pattern in storm-starter [here]({{page.git-blob-base}}/examples/storm-starter/src/jvm/org/apache/storm/starter/RollingTopWords.java). If however you have a known skew in the data being processed it can be advantageous to use partialKeyGrouping instead of fieldsGrouping. This will distribute the load for each key between two downstream bolts instead of a single one. @@ -83,7 +83,7 @@ builder.setBolt("merge", new MergeRanksObjects()) .globalGrouping("rank"); ``` -The topology needs an extra layer of processing to aggregate the partial counts from the upstream bolts but this only processes aggregated values now so the bolt it is not subject to the load caused by the skewed data. You can see an example of this pattern in storm-starter [here]({{page.git-blob-base}}/examples/storm-starter/src/jvm/storm/starter/SkewedRollingTopWords.java). +The topology needs an extra layer of processing to aggregate the partial counts from the upstream bolts but this only processes aggregated values now so the bolt it is not subject to the load caused by the skewed data. You can see an example of this pattern in storm-starter [here]({{page.git-blob-base}}/examples/storm-starter/src/jvm/org/apache/storm/starter/SkewedRollingTopWords.java). ### TimeCacheMap for efficiently keeping a cache of things that have been recently updated http://git-wip-us.apache.org/repos/asf/storm/blob/0458b007/docs/Distributed-RPC.md ---------------------------------------------------------------------- diff --git a/docs/Distributed-RPC.md b/docs/Distributed-RPC.md index 2ad63e5..b20419a 100644 --- a/docs/Distributed-RPC.md +++ b/docs/Distributed-RPC.md @@ -118,7 +118,7 @@ The reach of a URL is the number of unique people exposed to a URL on Twitter. T A single reach computation can involve thousands of database calls and tens of millions of follower records during the computation. It's a really, really intense computation. As you're about to see, implementing this function on top of Storm is dead simple. On a single machine, reach can take minutes to compute; on a Storm cluster, you can compute reach for even the hardest URLs in a couple seconds. -A sample reach topology is defined in storm-starter [here]({{page.git-blob-base}}/examples/storm-starter/src/jvm/storm/starter/ReachTopology.java). Here's how you define the reach topology: +A sample reach topology is defined in storm-starter [here]({{page.git-blob-base}}/examples/storm-starter/src/jvm/org/apache/storm/starter/ReachTopology.java). Here's how you define the reach topology: ```java LinearDRPCTopologyBuilder builder = new LinearDRPCTopologyBuilder("reach"); http://git-wip-us.apache.org/repos/asf/storm/blob/0458b007/docs/Transactional-topologies.md ---------------------------------------------------------------------- diff --git a/docs/Transactional-topologies.md b/docs/Transactional-topologies.md index a91d6c2..db5509f 100644 --- a/docs/Transactional-topologies.md +++ b/docs/Transactional-topologies.md @@ -81,7 +81,7 @@ Finally, another thing to note is that transactional topologies require a source ## The basics through example -You build transactional topologies by using [TransactionalTopologyBuilder](javadocs/org/apache/storm/transactional/TransactionalTopologyBuilder.html). Here's the transactional topology definition for a topology that computes the global count of tuples from the input stream. This code comes from [TransactionalGlobalCount]({{page.git-blob-base}}/examples/storm-starter/src/jvm/storm/starter/TransactionalGlobalCount.java) in storm-starter. +You build transactional topologies by using [TransactionalTopologyBuilder](javadocs/org/apache/storm/transactional/TransactionalTopologyBuilder.html). Here's the transactional topology definition for a topology that computes the global count of tuples from the input stream. This code comes from [TransactionalGlobalCount]({{page.git-blob-base}}/examples/storm-starter/src/jvm/org/apache/storm/starter/TransactionalGlobalCount.java) in storm-starter. ```java MemoryTransactionalSpout spout = new MemoryTransactionalSpout(DATA, new Fields("word"), PARTITION_TAKE_PER_BATCH); @@ -201,7 +201,7 @@ First, notice that this bolt implements the `ICommitter` interface. This tells S The code for `finishBatch` in `UpdateGlobalCount` gets the current value from the database and compares its transaction id to the transaction id for this batch. If they are the same, it does nothing. Otherwise, it increments the value in the database by the partial count for this batch. -A more involved transactional topology example that updates multiple databases idempotently can be found in storm-starter in the [TransactionalWords]({{page.git-blob-base}}/examples/storm-starter/src/jvm/storm/starter/TransactionalWords.java) class. +A more involved transactional topology example that updates multiple databases idempotently can be found in storm-starter in the [TransactionalWords]({{page.git-blob-base}}/examples/storm-starter/src/jvm/org/apache/storm/starter/TransactionalWords.java) class. ## Transactional Topology API @@ -255,7 +255,7 @@ The details of implementing a `TransactionalSpout` are in [the Javadoc](javadocs #### Partitioned Transactional Spout -A common kind of transactional spout is one that reads the batches from a set of partitions across many queue brokers. For example, this is how [TransactionalKafkaSpout]({{page.git-tree-base}}/external/storm-kafka/src/jvm/storm/kafka/TransactionalKafkaSpout.java) works. An `IPartitionedTransactionalSpout` automates the bookkeeping work of managing the state for each partition to ensure idempotent replayability. See [the Javadoc](javadocs/org/apache/storm/transactional/partitioned/IPartitionedTransactionalSpout.html) for more details. +A common kind of transactional spout is one that reads the batches from a set of partitions across many queue brokers. For example, this is how [TransactionalKafkaSpout]({{page.git-tree-base}}/external/storm-kafka/src/jvm/org/apache/storm/kafka/TransactionalKafkaSpout.java) works. An `IPartitionedTransactionalSpout` automates the bookkeeping work of managing the state for each partition to ensure idempotent replayability. See [the Javadoc](javadocs/org/apache/storm/transactional/partitioned/IPartitionedTransactionalSpout.html) for more details. ### Configuration @@ -325,7 +325,7 @@ In this scenario, tuples 41-50 are skipped. By failing all subsequent transactio By failing all subsequent transactions on failure, no tuples are skipped. This also shows that a requirement of transactional spouts is that they always emit where the last transaction left off. -A non-idempotent transactional spout is more concisely referred to as an "OpaqueTransactionalSpout" (opaque is the opposite of idempotent). [IOpaquePartitionedTransactionalSpout](javadocs/org/apache/storm/transactional/partitioned/IOpaquePartitionedTransactionalSpout.html) is an interface for implementing opaque partitioned transactional spouts, of which [OpaqueTransactionalKafkaSpout]({{page.git-tree-base}}/external/storm-kafka/src/jvm/storm/kafka/OpaqueTransactionalKafkaSpout.java) is an example. `OpaqueTransactionalKafkaSpout` can withstand losing individual Kafka nodes without sacrificing accuracy as long as you use the update strategy as explained in this section. +A non-idempotent transactional spout is more concisely referred to as an "OpaqueTransactionalSpout" (opaque is the opposite of idempotent). [IOpaquePartitionedTransactionalSpout](javadocs/org/apache/storm/transactional/partitioned/IOpaquePartitionedTransactionalSpout.html) is an interface for implementing opaque partitioned transactional spouts, of which [OpaqueTransactionalKafkaSpout]({{page.git-tree-base}}/external/storm-kafka/src/jvm/org/apache/storm/kafka/OpaqueTransactionalKafkaSpout.java) is an example. `OpaqueTransactionalKafkaSpout` can withstand losing individual Kafka nodes without sacrificing accuracy as long as you use the update strategy as explained in this section. ## Implementation http://git-wip-us.apache.org/repos/asf/storm/blob/0458b007/docs/Trident-state.md ---------------------------------------------------------------------- diff --git a/docs/Trident-state.md b/docs/Trident-state.md index 4ebb60a..bb5b1ee 100644 --- a/docs/Trident-state.md +++ b/docs/Trident-state.md @@ -28,7 +28,7 @@ Remember, Trident processes tuples as small batches with each batch being given 2. There's no overlap between batches of tuples (tuples are in one batch or another, never multiple). 3. Every tuple is in a batch (no tuples are skipped) -This is a pretty easy type of spout to understand, the stream is divided into fixed batches that never change. storm-contrib has [an implementation of a transactional spout]({{page.git-tree-base}}/external/storm-kafka/src/jvm/storm/kafka/trident/TransactionalTridentKafkaSpout.java) for Kafka. +This is a pretty easy type of spout to understand, the stream is divided into fixed batches that never change. storm-contrib has [an implementation of a transactional spout]({{page.git-tree-base}}/external/storm-kafka/src/jvm/org/apache/storm/kafka/trident/TransactionalTridentKafkaSpout.java) for Kafka. You might be wondering â why wouldn't you just always use a transactional spout? They're simple and easy to understand. One reason you might not use one is because they're not necessarily very fault-tolerant. For example, the way TransactionalTridentKafkaSpout works is the batch for a txid will contain tuples from all the Kafka partitions for a topic. Once a batch has been emitted, any time that batch is re-emitted in the future the exact same set of tuples must be emitted to meet the semantics of transactional spouts. Now suppose a batch is emitted from TransactionalTridentKafkaSpout, the batch fails to process, and at the same time one of the Kafka nodes goes down. You're now incapable of replaying the same batch as you did before (since the node is down and some partitions for the topic are not unavailable), and processing will halt. @@ -72,7 +72,7 @@ As described before, an opaque transactional spout cannot guarantee that the bat 1. Every tuple is *successfully* processed in exactly one batch. However, it's possible for a tuple to fail to process in one batch and then succeed to process in a later batch. -[OpaqueTridentKafkaSpout]({{page.git-tree-base}}/external/storm-kafka/src/jvm/storm/kafka/trident/OpaqueTridentKafkaSpout.java) is a spout that has this property and is fault-tolerant to losing Kafka nodes. Whenever it's time for OpaqueTridentKafkaSpout to emit a batch, it emits tuples starting from where the last batch finished emitting. This ensures that no tuple is ever skipped or successfully processed by multiple batches. +[OpaqueTridentKafkaSpout]({{page.git-tree-base}}/external/storm-kafka/src/jvm/org/apache/storm/kafka/trident/OpaqueTridentKafkaSpout.java) is a spout that has this property and is fault-tolerant to losing Kafka nodes. Whenever it's time for OpaqueTridentKafkaSpout to emit a batch, it emits tuples starting from where the last batch finished emitting. This ensures that no tuple is ever skipped or successfully processed by multiple batches. With opaque transactional spouts, it's no longer possible to use the trick of skipping state updates if the transaction id in the database is the same as the transaction id for the current batch. This is because the batch may have changed between state updates. http://git-wip-us.apache.org/repos/asf/storm/blob/0458b007/docs/Tutorial.md ---------------------------------------------------------------------- diff --git a/docs/Tutorial.md b/docs/Tutorial.md index 95e4283..5dad834 100644 --- a/docs/Tutorial.md +++ b/docs/Tutorial.md @@ -245,7 +245,7 @@ A stream grouping tells a topology how to send tuples between two components. Re When a task for Bolt A emits a tuple to Bolt B, which task should it send the tuple to? -A "stream grouping" answers this question by telling Storm how to send tuples between sets of tasks. Before we dig into the different kinds of stream groupings, let's take a look at another topology from [storm-starter](http://github.com/apache/storm/blob/{{page.version}}/examples/storm-starter). This [WordCountTopology]({{page.git-blob-base}}/examples/storm-starter/src/jvm/storm/starter/WordCountTopology.java) reads sentences off of a spout and streams out of `WordCountBolt` the total number of times it has seen that word before: +A "stream grouping" answers this question by telling Storm how to send tuples between sets of tasks. Before we dig into the different kinds of stream groupings, let's take a look at another topology from [storm-starter](http://github.com/apache/storm/blob/{{page.version}}/examples/storm-starter). This [WordCountTopology]({{page.git-blob-base}}/examples/storm-starter/src/jvm/org/apache/storm/starter/WordCountTopology.java) reads sentences off of a spout and streams out of `WordCountBolt` the total number of times it has seen that word before: ```java TopologyBuilder builder = new TopologyBuilder(); http://git-wip-us.apache.org/repos/asf/storm/blob/0458b007/examples/storm-starter/README.markdown ---------------------------------------------------------------------- diff --git a/examples/storm-starter/README.markdown b/examples/storm-starter/README.markdown index 48d9d27..d68ca6b 100644 --- a/examples/storm-starter/README.markdown +++ b/examples/storm-starter/README.markdown @@ -35,13 +35,13 @@ code. storm-starter contains a variety of examples of using Storm. If this is your first time working with Storm, check out these topologies first: -1. [ExclamationTopology](src/jvm/storm/starter/ExclamationTopology.java): Basic topology written in all Java -2. [WordCountTopology](src/jvm/storm/starter/WordCountTopology.java): Basic topology that makes use of multilang by +1. [ExclamationTopology](src/jvm/org/apache/storm/starter/ExclamationTopology.java): Basic topology written in all Java +2. [WordCountTopology](src/jvm/org/apache/storm/starter/WordCountTopology.java): Basic topology that makes use of multilang by implementing one bolt in Python -3. [ReachTopology](src/jvm/storm/starter/ReachTopology.java): Example of complex DRPC on top of Storm +3. [ReachTopology](src/jvm/org/apache/storm/starter/ReachTopology.java): Example of complex DRPC on top of Storm After you have familiarized yourself with these topologies, take a look at the other topopologies in -[src/jvm/storm/starter/](src/jvm/storm/starter/) such as [RollingTopWords](src/jvm/storm/starter/RollingTopWords.java) +[src/jvm/org/apache/storm/starter/](src/jvm/org/apache/storm/starter/) such as [RollingTopWords](src/jvm/org/apache/storm/starter/RollingTopWords.java) for more advanced implementations. If you want to learn more about how Storm works, please head over to the @@ -99,9 +99,9 @@ With submitting you can run topologies which use multilang, for example, `WordCo _Submitting a topology in local vs. remote mode:_ It depends on the actual code of a topology how you can or even must tell Storm whether to run the topology locally (in an in-memory LocalCluster instance of Storm) or remotely (in a "real" Storm cluster). In the case of -[RollingTopWords](src/jvm/storm/starter/RollingTopWords.java), for instance, this can be done by passing command line +[RollingTopWords](src/jvm/org/apache/storm/starter/RollingTopWords.java), for instance, this can be done by passing command line arguments. -Topologies other than `RollingTopWords` -- such as [ExclamationTopology](src/jvm/storm/starter/ExclamationTopology.java) +Topologies other than `RollingTopWords` -- such as [ExclamationTopology](src/jvm/org/apache/storm/starter/ExclamationTopology.java) -- may behave differently, e.g. by always submitting to a remote cluster (i.e. hardcoded in a way that you, as a user, cannot change without modifying the topology code), or by requiring a customized configuration file that the topology code will parse prior submitting the topology to Storm. Similarly, further options such as the name of the topology may
