Hi Andreas,

Not quite understand this part

"Because the messages coming into the input stream are random (i.e. can hit
any partition and therefore any task), each task will need its own copy of
the data (i.e. the data needs to be duplicated across each task)."

Messages come into the input stream based on the partition key (not totally
random). Why does each task need its own copy of the data? Do you mean the
copy of the data in other partitions?

Cheers,

Fang, Yan
yanfang...@gmail.com

On Tue, May 5, 2015 at 11:47 AM, Andreas Simanowski <aesim...@gmail.com>
wrote:

> Hello Samza community:
>
> I am very new to Samza and currently looking at how to use Samza and its
> key-value store. I have run into the following and was hoping someone could
> point me in the right direction.
>
> Say we have an input stream being consumed by more than one task (one task
> per partition). Each task has a local key-value store which it will
> reference when processing the messages. Because the messages coming into
> the input stream are random (i.e. can hit any partition and therefore any
> task), each task will need its own copy of the data (i.e. the data needs to
> be duplicated across each task). From time-to-time this local data would
> also need to be updated with changes. What approaches are there to share
> data between the tasks to keep them up to date?
>
> Thanks for the help!
>
> -Andreas
>

Reply via email to