[
https://issues.apache.org/jira/browse/SAMZA-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14109641#comment-14109641
]
Chinmay Soman commented on SAMZA-353:
-------------------------------------
Regarding broadcast stream:
=========================
* Can broadcast be a special type of Kafka topic ? (i.e. new feature within
Kafka). Instead of us mucking about this, it seems the more natural way is to
support this inherently in Kafka
Thoughts regarding global state store
===============================
* Modifying global state - My opinion is NO
=> Supporting all the edge cases will make this design complicated. Plus the
user can always maintain a derived local store (i.e. based on global state +
input + modifications)
* Should we keep reading the global state stream ?
=> My initial feeling was YES -> because it saves the developer from any
effort of specifying when to bootstrap the global state store (which in turn
might need to be coordinated with the change-set push to kafka). However, on
second thought, it seems complicated because:
- You probably need a separate thread for handling this bootstrap ( +
scheduling effort)
- You will need to track offsets
- An alternative design would be for Samza to handle this with a defined
frequency (once every day) -> This seems way simpler !
* Global state store per container or per task ?
=> This seems complicated. The good thing about per task state is that we
don't need any other code change. However, this will end up using more space
(disk and/or memory).
Con of per container global state => Probably need a separate thread OR
coordination amongst all the tasks (if we don't want to halt the main loop).
Also during recovery, either the main loop or one of the tasks has to be
responsible for restoring this global state. Again, my preference will be to go
with the simple design -> per task global state.
> Support assigning the same SSP to multiple tasknames
> ----------------------------------------------------
>
> Key: SAMZA-353
> URL: https://issues.apache.org/jira/browse/SAMZA-353
> Project: Samza
> Issue Type: Bug
> Components: container
> Affects Versions: 0.8.0
> Reporter: Jakob Homan
> Attachments: DESIGN-SAMZA-353-0.md, DESIGN-SAMZA-353-0.pdf
>
>
> Post SAMZA-123, it is possible to add the same SSP to multiple tasknames,
> although currently we check for this and error out if this is done. We
> should think through the implications of having the same SSP appear in
> multiple tasknames and support this if it makes sense.
> This could be used as a broadcast stream that's either added by Samza itself
> to each taskname, or individual groupers could do this as makes sense. Right
> now the container maintains a map of SSP to TaskInstance and delivers the ssp
> to that task instance. With this change, we'd need to change the map to SSP
> to Set[TaskInstance] and deliver the message to each TI in the set.
--
This message was sent by Atlassian JIRA
(v6.2#6252)