It seems like this problem could be done with one map-reduce task.
From your input, map out (ID,{type,TimeStamp})

in your reduce, you can figure out how many A1's appear close to eachother. one naive approach is to iterate through all of the sets and collect them in some collection class. Then, if your custom set class implements Comparable, you can just call Collections.sort(myList). i'm sure there are faster solutions (perhaps you could sort them as you iterate through by hashing based on timestamp?)

does this answer your question?

On Jul 9, 2008, at 4:59 PM, Elia Mazzawi wrote:

can someone point me to an example i can learn from.

I have a data set that looks like this:

ID    type   Timestamp

A1    X   1215647404
A2    X   1215647405
A3    X   1215647406
A1   Y   1215647409

I want to count how many A1 Y, show up within 5 seconds of an A1 X

I've written a few hadoop programs already but they were based on the
wordcount example. and so only work with 1 line at a time.
This problem requires looking back or remembering state? or more than
one pass?
I was thinking that it is possible to sort the data by ID, timestamp.
then in that case the program only needs to look back a few lines at a time?

seems like a common problem so i thought I'd ask if there was an example
that is close to that or if someone has written something already.

P.S. Hadoop Rocks!

Reply via email to