[ 
https://issues.apache.org/jira/browse/SAMZA-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14104739#comment-14104739
 ] 

Chris Riccomini commented on SAMZA-353:
---------------------------------------

A specific use case for the mutiple-task-SSP-assignement is a "global state 
store"--a state store topic that all StreamTasks have full access to (not 
partitioned). Most use cases that I can think of primarily want this as a store 
to read from so that a join can be done against a non-partitioned data set.

An alternative approach to support "global state" would be to implement it more 
the way we implementing our current stores. Currently, state stores are 
bootstrapped via the restore() method with their own SystemConsumers outside of 
the standard MessageChooser/SSPGrouper code flow. We could follow this same 
model with "global state".

The drawbacks (that I can think of) to implementing global state stores as we 
implement state store restoration currently is:

# Our current state store implementation restores state stores fully, and then 
closes off the consumer, and never again feeds messages into the store (until 
the container is restarted). With global state stores, you might want to 
continue taking updates and putting them into the global state store. This is 
certainly the case when an offline Hadoop flow (for example) is sending 
messages to the stream periodically to update the state store.
# There is no opportunity in this model for the StreamTask to manipulate or 
filter the incoming global state messages. The SSPGrouper approach would allow 
StreamTasks to mess with the incoming messages before they're stored.

One pro for this approach is that the state stores could be shared between 
StreamTasks in the same container. In the SSPGrouper approach, you end up with 
one store per-task even if the stores are all identical.

> Support assigning the same SSP to multiple tasknames
> ----------------------------------------------------
>
>                 Key: SAMZA-353
>                 URL: https://issues.apache.org/jira/browse/SAMZA-353
>             Project: Samza
>          Issue Type: Bug
>          Components: container
>    Affects Versions: 0.8.0
>            Reporter: Jakob Homan
>
> Post SAMZA-123, it is possible to add the same SSP to multiple tasknames, 
> although currently we check for this and error out if this is done.  We 
> should think through the implications of having the same SSP appear in 
> multiple tasknames and support this if it makes sense.  
> This could be used as a broadcast stream that's either added by Samza itself 
> to each taskname, or individual groupers could do this as makes sense.  Right 
> now the container maintains a map of SSP to TaskInstance and delivers the ssp 
> to that task instance.  With this change, we'd need to change the map to SSP 
> to Set[TaskInstance] and deliver the message to each TI in the set.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to