[
https://issues.apache.org/jira/browse/SAMZA-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14104739#comment-14104739
]
Chris Riccomini commented on SAMZA-353:
---------------------------------------
A specific use case for the mutiple-task-SSP-assignement is a "global state
store"--a state store topic that all StreamTasks have full access to (not
partitioned). Most use cases that I can think of primarily want this as a store
to read from so that a join can be done against a non-partitioned data set.
An alternative approach to support "global state" would be to implement it more
the way we implementing our current stores. Currently, state stores are
bootstrapped via the restore() method with their own SystemConsumers outside of
the standard MessageChooser/SSPGrouper code flow. We could follow this same
model with "global state".
The drawbacks (that I can think of) to implementing global state stores as we
implement state store restoration currently is:
# Our current state store implementation restores state stores fully, and then
closes off the consumer, and never again feeds messages into the store (until
the container is restarted). With global state stores, you might want to
continue taking updates and putting them into the global state store. This is
certainly the case when an offline Hadoop flow (for example) is sending
messages to the stream periodically to update the state store.
# There is no opportunity in this model for the StreamTask to manipulate or
filter the incoming global state messages. The SSPGrouper approach would allow
StreamTasks to mess with the incoming messages before they're stored.
One pro for this approach is that the state stores could be shared between
StreamTasks in the same container. In the SSPGrouper approach, you end up with
one store per-task even if the stores are all identical.
> Support assigning the same SSP to multiple tasknames
> ----------------------------------------------------
>
> Key: SAMZA-353
> URL: https://issues.apache.org/jira/browse/SAMZA-353
> Project: Samza
> Issue Type: Bug
> Components: container
> Affects Versions: 0.8.0
> Reporter: Jakob Homan
>
> Post SAMZA-123, it is possible to add the same SSP to multiple tasknames,
> although currently we check for this and error out if this is done. We
> should think through the implications of having the same SSP appear in
> multiple tasknames and support this if it makes sense.
> This could be used as a broadcast stream that's either added by Samza itself
> to each taskname, or individual groupers could do this as makes sense. Right
> now the container maintains a map of SSP to TaskInstance and delivers the ssp
> to that task instance. With this change, we'd need to change the map to SSP
> to Set[TaskInstance] and deliver the message to each TI in the set.
--
This message was sent by Atlassian JIRA
(v6.2#6252)