You should be using the grouping to do routing.  Typically in these cases
you would partition the database so each bolt has a subset of the keys,
and they are routed to the DB based on a fieldsGrouping (group by in
trident).

For an example of this you can look at the trident word count example

https://github.com/nathanmarz/storm-starter/blob/master/src/jvm/storm/start
er/trident/TridentWordCount.java

‹Bobby  

On 12/18/13, 4:43 PM, "Josh Walton" <[email protected]> wrote:

>I apologize if this is a duplicate post/message. I had forgotten that the
>user group is no longer in use.
>
>I have an in memory, Java database that I want to use for my Trident
>State.
>My testing consists of running a limited set of data through the topology,
>and then querying the results through a DRPC stream that performs a
>Trident
>state query.
>
>When the trident state is in a single executor, the same state query
>returns the same data every time. When the trident state is in a bolt with
>multiple executors, I get X different results for the same state query
>where X is the number of executors containing the trident state.
>
>Is there a way to specify which trident state is used for a DRPC query? Is
>there a way to get the results (and aggregate) from the all of the
>different trident state instances?
>
>So I created a sample project in which to show the problem. Here's the
>repository: https://github.com/jwalton922/InMemoryTridentStateTest
>
>I submit two topologies like:
>
>storm jar
>TestTopology/target/TestTopology-1.0-SNAPSHOT-jar-with-dependencies.jar
>com.mrcy.testtopology.TestTopologyBuilderry 1
>
>storm jar
>TestTopology/target/TestTopology-1.0-SNAPSHOT-jar-with-dependencies.jar
>com.mrcy.testtopology.TestTopologyBuilderry 2
>
>When I query the first topology (with parallelism 1) I get the following
>output indefinitely like:
>DRPC result: [["",["event 1","event 2","event 3","event 1"]]]
>DRPC result: [["",["event 1","event 2","event 3","event 1"]]]
>DRPC result: [["",["event 1","event 2","event 3","event 1"]]]
>DRPC result: [["",["event 1","event 2","event 3","event 1"]]]
>DRPC result: [["",["event 1","event 2","event 3","event 1"]]]
>DRPC result: [["",["event 1","event 2","event 3","event 1"]]]
>DRPC result: [["",["event 1","event 2","event 3","event 1"]]]
>DRPC result: [["",["event 1","event 2","event 3","event 1"]]]
>DRPC result: [["",["event 1","event 2","event 3","event 1"]]]
>DRPC result: [["",["event 1","event 2","event 3","event 1"]]]
>
>When I query the topology with parallelism of 2, I get output like:
>DRPC result: [["",["event 2","event 1"]]]
>DRPC result: [["",["event 2","event 1"]]]
>DRPC result: [["",["event 2","event 1"]]]
>DRPC result: [["",["event 2","event 1"]]]
>DRPC result: [["",["event 1","event 3"]]]
>DRPC result: [["",["event 1","event 3"]]]
>DRPC result: [["",["event 1","event 3"]]]
>DRPC result: [["",["event 2","event 1"]]]
>DRPC result: [["",["event 2","event 1"]]]
>DRPC result: [["",["event 1","event 3"]]]
>DRPC result: [["",["event 1","event 3"]]]
>DRPC result: [["",["event 1","event 3"]]]
>DRPC result: [["",["event 2","event 1"]]]
>DRPC result: [["",["event 2","event 1"]]]
>
>As you can see, when the in memory trident state is in more than one
>executor, you don't know which state will be queried, and your results
>will
>potentially differ between DRPC queries.
>
>My real problem involves hosting an in memory graph database. I'd like to
>be able to partition the graph by some vertex property, but then I would
>need to be able to query across multiple executors of my trident state so
>that I could return vertices and edges cross partitions.

Reply via email to