[ 
https://issues.apache.org/jira/browse/SAMZA-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125052#comment-14125052
 ] 

Martin Kleppmann commented on SAMZA-402:
----------------------------------------

bq. The deciding factor to me on which approach is actually going to be 
"better" for a Samza job is whether the state that it needs is already in a DB. 
If it's already in a DB, and has to continue to remain there for other reasons, 
then there is complexity in setting up a change log and having the Samza job 
consume the state (vs. just querying it). 

If the DB is already there, then the job author can just use the client library 
to query it directly. Any caching that is put in place isn't really specific to 
Samza in any way. So I am inclined to think that Samza doesn't need explicit 
support for calls to external DBs — people can just use whatever existing 
mechanisms there are. Or am I missing something?

bq. If it's not, then the global state solution seems preferable (since the 
data is probably coming from a Hadoop push).

I agree that the discussed approach for global state is preferable if there is 
no existing DB (due to the operational complexity of running the additional 
DB). However, I wouldn't assume that such global state would be coming from 
Hadoop. Indeed, the two example use cases I gave above have nothing to do with 
Hadoop.

> Provide a "shared state" store among StreamTasks
> ------------------------------------------------
>
>                 Key: SAMZA-402
>                 URL: https://issues.apache.org/jira/browse/SAMZA-402
>             Project: Samza
>          Issue Type: Bug
>          Components: container, kv
>    Affects Versions: 0.8.0
>            Reporter: Chris Riccomini
>         Attachments: DESIGN-SAMZA-402-0.md, DESIGN-SAMZA-402-0.pdf
>
>
> There has been a lot of discussion about shared state stores in SAMZA-353. 
> Initially, it seemed as though we might implement them through SAMZA-353, but 
> now it seems more preferable to implement them separately. As such, this 
> ticket is to discuss global state/shared state (terms that are being used 
> interchangeably) between StreamTasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to