Hi again, the idea is that you emit every event multiple times. So your map input record (event1, 10:07) will be emitted seven times during the map() call. Like I said, (10:04,event1), (10:05,event1), ..., (10:10,event1) will be the seven outputs for processing a single event.
The output key will be the time stamps in which neighbourhood or interval each event should be joined with events that happened +/- 3 minutes near it. So events which happened within a 7 minutes distance will both be emitted with the same time stamp as the map() output, and thus meet in a reduce() call. A reduce() call will look like this: reduce(10:03, list_of_events). And those events had time stamps between 10:00 and 10:06 in the original input. Kai Am 28.01.2013 um 14:43 schrieb Oleg Ruchovets <oruchov...@gmail.com>: > Hi Kai. > It is very interesting. Can you please explain in more details your > Idea? > What will be a key in a map phase? > > Suppose we have event at 10:07. How would you emit this to the multiple > buckets? > > Thanks > Oleg. > > > On Mon, Jan 28, 2013 at 3:17 PM, Kai Voigt <k...@123.org> wrote: > >> Quick idea: >> >> since each of your events will go into several buckets, you could use >> map() to emit each item multiple times for each bucket. >> >> Am 28.01.2013 um 13:56 schrieb Oleg Ruchovets <oruchov...@gmail.com>: >> >>> Hi , >>> I have such row data structure: >>> >>> event_id | time >>> ============== >>> event1 | 10:07 >>> event2 | 10:10 >>> event3 | 10:12 >>> >>> event4 | 10:20 >>> event5 | 10:23 >>> event6 | 10:25 >> >> map(event1,10:07) would emit (10:04,event1), (10:05,event1), ..., >> (10:10,event1) and so on. >> >> In reduce(), all your desired events would meet for the same minute. >> >> Kai >> >> -- >> Kai Voigt >> k...@123.org >> >> >> >> >> -- Kai Voigt k...@123.org