Hey Mikel, This use case has been discussed in some detail here:
https://issues.apache.org/jira/browse/SAMZA-353 Samza currently doesn't allow a single partition to be consumed by more than one task. You can, however, send the same message to multiple partitions. Within Samza, this can be achieved using this OutgoingMessageEnvelope constructor: OutgoingMessageEnvelope(SystemStream systemStream, Object partitionKey, Object key, Object message) If you specify the partitionKey to be the partition #, you can send the same message to partition X and partition Y (in your example). Obviously, this isn't ideal (you have to write the same message N times), but it should work. Martin Kleppmann had a very similar use case, which he describes here: https://issues.apache.org/jira/browse/SAMZA-353?focusedCommentId=14205216&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14205216 Cheers, Chris On Fri, Feb 6, 2015 at 5:00 AM, mikel roah <[email protected]> wrote: > Hello, > > I'm working on a project with Samza. The system receives streams of > messages and if a single message matches a set of keywords, it performs an > action on it (i.e. deliver it outside the system or update the internal > state). There is one job that performs the keyword matching and it should > scale in 2 ways: > > - with the number of events > - with the number of keywords > > > The first point is achieved by controlling the number of partitions and > containers. Instead the second one by splitting the set of keywords over > different tasks that run in containers like this: > > > > > > This design would allow to handle messages and split the matching job over > different tasks. How hard is to deliver the message to task 1 on > partition X and to task 4 on partition Y? > > > Thanks > > > Mikel >
