You should be using the grouping to do routing. Typically in these cases you would partition the database so each bolt has a subset of the keys, and they are routed to the DB based on a fieldsGrouping (group by in trident).
For an example of this you can look at the trident word count example https://github.com/nathanmarz/storm-starter/blob/master/src/jvm/storm/start er/trident/TridentWordCount.java ‹Bobby On 12/18/13, 4:43 PM, "Josh Walton" <[email protected]> wrote: >I apologize if this is a duplicate post/message. I had forgotten that the >user group is no longer in use. > >I have an in memory, Java database that I want to use for my Trident >State. >My testing consists of running a limited set of data through the topology, >and then querying the results through a DRPC stream that performs a >Trident >state query. > >When the trident state is in a single executor, the same state query >returns the same data every time. When the trident state is in a bolt with >multiple executors, I get X different results for the same state query >where X is the number of executors containing the trident state. > >Is there a way to specify which trident state is used for a DRPC query? Is >there a way to get the results (and aggregate) from the all of the >different trident state instances? > >So I created a sample project in which to show the problem. Here's the >repository: https://github.com/jwalton922/InMemoryTridentStateTest > >I submit two topologies like: > >storm jar >TestTopology/target/TestTopology-1.0-SNAPSHOT-jar-with-dependencies.jar >com.mrcy.testtopology.TestTopologyBuilderry 1 > >storm jar >TestTopology/target/TestTopology-1.0-SNAPSHOT-jar-with-dependencies.jar >com.mrcy.testtopology.TestTopologyBuilderry 2 > >When I query the first topology (with parallelism 1) I get the following >output indefinitely like: >DRPC result: [["",["event 1","event 2","event 3","event 1"]]] >DRPC result: [["",["event 1","event 2","event 3","event 1"]]] >DRPC result: [["",["event 1","event 2","event 3","event 1"]]] >DRPC result: [["",["event 1","event 2","event 3","event 1"]]] >DRPC result: [["",["event 1","event 2","event 3","event 1"]]] >DRPC result: [["",["event 1","event 2","event 3","event 1"]]] >DRPC result: [["",["event 1","event 2","event 3","event 1"]]] >DRPC result: [["",["event 1","event 2","event 3","event 1"]]] >DRPC result: [["",["event 1","event 2","event 3","event 1"]]] >DRPC result: [["",["event 1","event 2","event 3","event 1"]]] > >When I query the topology with parallelism of 2, I get output like: >DRPC result: [["",["event 2","event 1"]]] >DRPC result: [["",["event 2","event 1"]]] >DRPC result: [["",["event 2","event 1"]]] >DRPC result: [["",["event 2","event 1"]]] >DRPC result: [["",["event 1","event 3"]]] >DRPC result: [["",["event 1","event 3"]]] >DRPC result: [["",["event 1","event 3"]]] >DRPC result: [["",["event 2","event 1"]]] >DRPC result: [["",["event 2","event 1"]]] >DRPC result: [["",["event 1","event 3"]]] >DRPC result: [["",["event 1","event 3"]]] >DRPC result: [["",["event 1","event 3"]]] >DRPC result: [["",["event 2","event 1"]]] >DRPC result: [["",["event 2","event 1"]]] > >As you can see, when the in memory trident state is in more than one >executor, you don't know which state will be queried, and your results >will >potentially differ between DRPC queries. > >My real problem involves hosting an in memory graph database. I'd like to >be able to partition the graph by some vertex property, but then I would >need to be able to query across multiple executors of my trident state so >that I could return vertices and edges cross partitions.
