1. Are you trying to do a stream-stream join? You can easily join N
streams, by using a single Samza job.  Each task processes messages from
the same partition from all your input streams. You can over-partition your
input kafka streams, so that you can potentially scale out with a large
number of containers.

2. I'm not entirely clear on this one. Please Could you elaborate on (2)
with an example?  What are the input-streams that are being processed? How
are they generated?  What operation are you trying to do on the input
stream's data ? (What output do you expect?)

Thanks,
Jagadish


On Thu, Dec 3, 2015 at 9:59 PM, Josh Morris <[email protected]> wrote:

> A colleague and I are trying to understand Samza a bit more, and the ideas
> behind it, realized or to be realized. We've been through some of the
> referenced videos/articles, documentation, and were discussing a couple of
> use cases that we weren't sure how would be solved.
>
> Use case 1 is about how a multi-join query in SQL would be represented in
> Samza. When looking at the description of join's on the state management[1]
> page, the examples are of 2 table joins. We have use cases of a much higher
> number of table joins in order to flatten (denormalize) data to store in
> our reporting database. If there were N joins would this be a Samza job
> with N input streams, or N-1 jobs each with at most 2 input streams. Where
> jobs in layer 2> having one of there input streams come from the output of
> the previous job?
>
> Use case 2 is about how to apply referential integrity. Using a shopping
> cart analogy, if I have an product added to my cart. The cart is
> represented by an order record, the product being added is represented by
> an item record with a foreign key to the order record in it. In a
> traditional DB setting if I try to insert a item record with an order id in
> it, and the order with that id doesn't exist, my referential integrity
> checks on the DB would fail. How does this work in the Samza case, do all
> writes to the log (Kafka) succeed and I do the integrity check later when
> creating my view?
>
> [1]
>
> http://samza.apache.org/learn/documentation/0.9/container/state-management.html
>
> Thanks
> Josh
>



-- 
Jagadish V,
Graduate Student,
Department of Computer Science,
Stanford University

Reply via email to