Hi Shekar, You can use Kafka's partitioning capabilities to partition your stream based on application. That will make sure events related to a application will always ended up in same partition. With this you will have multiple applications in same partition and each partition will be mapped to a single Samza task instance (AFAIK, Samza job is devided into several task instances based on number of partitions in your topic.). Then in your Samza task implementation you should maintain windowing counts for each application.
Thanks Milinda On Sun, Jun 28, 2015 at 8:48 PM, Shekar Tippur <ctip...@gmail.com> wrote: > Milinda, > > I see that the document you mentioned addresses windowing but I also need > to group by different applications. > > Application Count > --------------- -------- > A 100 > B 40 > C 69 > .... > > - Shekar > > On Fri, Jun 26, 2015 at 11:39 AM, Shekar Tippur <ctip...@gmail.com> wrote: > > > Never mind. I see it here: > > > > http://samza.apache.org/learn/documentation/0.8/container/windowing.html > > > > Thanks again Milinda. > > > > - Shekar > > > > On Fri, Jun 26, 2015 at 11:39 AM, Shekar Tippur <ctip...@gmail.com> > wrote: > > > >> Thanks Milinda. > >> Is this feature available on 0.8 version of Samza? > >> > >> - Shekar > >> > >> On Fri, Jun 26, 2015 at 11:24 AM, Milinda Pathirage < > >> mpath...@umail.iu.edu> wrote: > >> > >>> Hi Shekar, > >>> > >>> You can use Samza's local storage ( > >>> > >>> > http://samza.apache.org/learn/documentation/0.9/container/state-management.html > >>> ) > >>> to keep the window state and windowing ( > >>> > http://samza.apache.org/learn/documentation/0.9/container/windowing.html > >>> ) > >>> capabilities to handle the window advancement. During advancement you > can > >>> update the local cache (Redis in your case). AFAIK, Samza doesn't > provide > >>> any helpers or utilities to handle window state maintenance. You have > to > >>> implement it on top of local storage or if you don't won't fault > >>> tolerance > >>> you can keep the state in-memory too (as long as the state fit in > >>> memory). > >>> > >>> Thanks > >>> Milinda > >>> > >>> On Fri, Jun 26, 2015 at 1:53 PM, Shekar Tippur <ctip...@gmail.com> > >>> wrote: > >>> > >>> > Yan, > >>> > > >>> > > >>> > *What do you mean by "a local cache"? Is it a db like MySQL, > something > >>> > likeRocksDB, or even just in-memory?* > >>> > > >>> > Local cache as in Redis > >>> > > >>> > > >>> > > >>> > *When you say "another topic", is this the topic consumed by the same > >>> > Samzajob as your 5-minutes-job, or in a separate job? What is the > >>> > relationbetween the topic and the application name* > >>> > > >>> > We dont have a 5 min job. All we have now is a stream of events > coming > >>> from > >>> > a bunch of applications. All these land on a raw kafka topic. The > >>> stream > >>> > data has application name. I want to create a job that takes incoming > >>> > stream and group it by application name and count the number of > events > >>> we > >>> > get in a 5 min sliding window. > >>> > > >>> > - Shekar > >>> > > >>> > On Fri, Jun 26, 2015 at 10:29 AM, Yan Fang <yanfang...@gmail.com> > >>> wrote: > >>> > > >>> > > Hi Shekar, > >>> > > > >>> > > Need a little more clarification. > >>> > > > >>> > > What do you mean by "a local cache"? Is it a db like MySQL, > something > >>> > like > >>> > > RocksDB, or even just in-memory? > >>> > > > >>> > > When you say "another topic", is this the topic consumed by the > same > >>> > Samza > >>> > > job as your 5-minutes-job, or in a separate job? What is the > relation > >>> > > between the topic and the application name? > >>> > > > >>> > > Thanks, > >>> > > > >>> > > Fang, Yan > >>> > > yanfang...@gmail.com > >>> > > > >>> > > On Fri, Jun 26, 2015 at 1:08 AM, Shekar Tippur <ctip...@gmail.com> > >>> > wrote: > >>> > > > >>> > > > Hello, > >>> > > > My apologies if I have raised it earlier. > >>> > > > Here is the use case: > >>> > > > I have a stream that is partitioned based on application name. I > >>> want > >>> > to > >>> > > be > >>> > > > able to count hte number of events happening for that particular > >>> > > > application in the past 5 minutes (sliding window) and update > >>> either > >>> > > > another topic or a local cache. > >>> > > > > >>> > > > Is this possible via 0.9 version of Samza? > >>> > > > If not, what is the easiest way to achieve this? > >>> > > > > >>> > > > - Shekar > >>> > > > > >>> > > > >>> > > >>> > >>> > >>> > >>> -- > >>> Milinda Pathirage > >>> > >>> PhD Student | Research Assistant > >>> School of Informatics and Computing | Data to Insight Center > >>> Indiana University > >>> > >>> twitter: milindalakmal > >>> skype: milinda.pathirage > >>> blog: http://milinda.pathirage.org > >>> > >> > >> > > > -- Milinda Pathirage PhD Student | Research Assistant School of Informatics and Computing | Data to Insight Center Indiana University twitter: milindalakmal skype: milinda.pathirage blog: http://milinda.pathirage.org