Well , much more clear , but still have a questions :-) Suppose we have 3 map input records
event1 | 10:07 event2 | 10:10 event3 | 10:12 Output from map(event1 | 10:07) will be : mapOutput(10:04:event1) mapOutput(10:05:event1) mapOutput(10:06:event1) mapOutput(10:07:event1) mapOutput(10:08:event1) mapOutput(10:09:event1) mapOutput(10:10:event1) Output for map(event2 | 10:10) will be: mapOutput(10:07:event2) mapOutput(10:08:event2) mapOutput(10:09:event2) mapOutput(10:10:event2) mapOutput(10:11:event2) mapOutput(10:12:event2) mapOutput(10:13:event2) Output for map (event3 | 10:12) will be: mapOutput(10:09: event3) mapOutput(10:10 : event3) mapOutput(10:11 : event3) mapOutput(10:12 : event3) mapOutput(10:13 : event3) mapOutput(10:14 : event3) mapOutput(10:15 : event3) Is it correct? If yes , in reducer phase I will get such inputs: reducer(10:04:event1) reducer(10:05:event1) reducer(10:06:event1) reducer(10:07:event1 ,event2) reducer(10:08:event1 , event2) reducer(10:09:event1 , event2 , event3) reducer(10:10:event1 , event2 , event3) reducer(10:11:event3) reducer(10:12:event3) reducer(10:13:event3) reducer(10:14:event3) reducer(10:15:event3) Iterating over each reducer input how can I know at the end of aggregations which events were during 7 minutes? Thansk Oleg. On Mon, Jan 28, 2013 at 3:48 PM, Kai Voigt <k...@123.org> wrote: > Hi again, > > the idea is that you emit every event multiple times. So your map input > record (event1, 10:07) will be emitted seven times during the map() call. > Like I said, (10:04,event1), (10:05,event1), ..., (10:10,event1) will be > the seven outputs for processing a single event. > > The output key will be the time stamps in which neighbourhood or interval > each event should be joined with events that happened +/- 3 minutes near > it. So events which happened within a 7 minutes distance will both be > emitted with the same time stamp as the map() output, and thus meet in a > reduce() call. > > A reduce() call will look like this: reduce(10:03, list_of_events). And > those events had time stamps between 10:00 and 10:06 in the original input. > > Kai > > Am 28.01.2013 um 14:43 schrieb Oleg Ruchovets <oruchov...@gmail.com>: > > > Hi Kai. > > It is very interesting. Can you please explain in more details your > > Idea? > > What will be a key in a map phase? > > > > Suppose we have event at 10:07. How would you emit this to the multiple > > buckets? > > > > Thanks > > Oleg. > > > > > > On Mon, Jan 28, 2013 at 3:17 PM, Kai Voigt <k...@123.org> wrote: > > > >> Quick idea: > >> > >> since each of your events will go into several buckets, you could use > >> map() to emit each item multiple times for each bucket. > >> > >> Am 28.01.2013 um 13:56 schrieb Oleg Ruchovets <oruchov...@gmail.com>: > >> > >>> Hi , > >>> I have such row data structure: > >>> > >>> event_id | time > >>> ============== > >>> event1 | 10:07 > >>> event2 | 10:10 > >>> event3 | 10:12 > >>> > >>> event4 | 10:20 > >>> event5 | 10:23 > >>> event6 | 10:25 > >> > >> map(event1,10:07) would emit (10:04,event1), (10:05,event1), ..., > >> (10:10,event1) and so on. > >> > >> In reduce(), all your desired events would meet for the same minute. > >> > >> Kai > >> > >> -- > >> Kai Voigt > >> k...@123.org > >> > >> > >> > >> > >> > > -- > Kai Voigt > k...@123.org > > > > >