[
https://issues.apache.org/jira/browse/SAMZA-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125052#comment-14125052
]
Martin Kleppmann commented on SAMZA-402:
----------------------------------------
bq. The deciding factor to me on which approach is actually going to be
"better" for a Samza job is whether the state that it needs is already in a DB.
If it's already in a DB, and has to continue to remain there for other reasons,
then there is complexity in setting up a change log and having the Samza job
consume the state (vs. just querying it).
If the DB is already there, then the job author can just use the client library
to query it directly. Any caching that is put in place isn't really specific to
Samza in any way. So I am inclined to think that Samza doesn't need explicit
support for calls to external DBs — people can just use whatever existing
mechanisms there are. Or am I missing something?
bq. If it's not, then the global state solution seems preferable (since the
data is probably coming from a Hadoop push).
I agree that the discussed approach for global state is preferable if there is
no existing DB (due to the operational complexity of running the additional
DB). However, I wouldn't assume that such global state would be coming from
Hadoop. Indeed, the two example use cases I gave above have nothing to do with
Hadoop.
> Provide a "shared state" store among StreamTasks
> ------------------------------------------------
>
> Key: SAMZA-402
> URL: https://issues.apache.org/jira/browse/SAMZA-402
> Project: Samza
> Issue Type: Bug
> Components: container, kv
> Affects Versions: 0.8.0
> Reporter: Chris Riccomini
> Attachments: DESIGN-SAMZA-402-0.md, DESIGN-SAMZA-402-0.pdf
>
>
> There has been a lot of discussion about shared state stores in SAMZA-353.
> Initially, it seemed as though we might implement them through SAMZA-353, but
> now it seems more preferable to implement them separately. As such, this
> ticket is to discuss global state/shared state (terms that are being used
> interchangeably) between StreamTasks.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)