1. Are you trying to do a stream-stream join? You can easily join N streams, by using a single Samza job. Each task processes messages from the same partition from all your input streams. You can over-partition your input kafka streams, so that you can potentially scale out with a large number of containers.
2. I'm not entirely clear on this one. Please Could you elaborate on (2) with an example? What are the input-streams that are being processed? How are they generated? What operation are you trying to do on the input stream's data ? (What output do you expect?) Thanks, Jagadish On Thu, Dec 3, 2015 at 9:59 PM, Josh Morris <[email protected]> wrote: > A colleague and I are trying to understand Samza a bit more, and the ideas > behind it, realized or to be realized. We've been through some of the > referenced videos/articles, documentation, and were discussing a couple of > use cases that we weren't sure how would be solved. > > Use case 1 is about how a multi-join query in SQL would be represented in > Samza. When looking at the description of join's on the state management[1] > page, the examples are of 2 table joins. We have use cases of a much higher > number of table joins in order to flatten (denormalize) data to store in > our reporting database. If there were N joins would this be a Samza job > with N input streams, or N-1 jobs each with at most 2 input streams. Where > jobs in layer 2> having one of there input streams come from the output of > the previous job? > > Use case 2 is about how to apply referential integrity. Using a shopping > cart analogy, if I have an product added to my cart. The cart is > represented by an order record, the product being added is represented by > an item record with a foreign key to the order record in it. In a > traditional DB setting if I try to insert a item record with an order id in > it, and the order with that id doesn't exist, my referential integrity > checks on the DB would fail. How does this work in the Samza case, do all > writes to the log (Kafka) succeed and I do the integrity check later when > creating my view? > > [1] > > http://samza.apache.org/learn/documentation/0.9/container/state-management.html > > Thanks > Josh > -- Jagadish V, Graduate Student, Department of Computer Science, Stanford University
