can someone point me to an example i can learn from. I have a data set that looks like this:
ID type Timestamp A1 X 1215647404 A2 X 1215647405 A3 X 1215647406 A1 Y 1215647409 I want to count how many A1 Y, show up within 5 seconds of an A1 X I've written a few hadoop programs already but they were based on the wordcount example. and so only work with 1 line at a time. This problem requires looking back or remembering state? or more than one pass? I was thinking that it is possible to sort the data by ID, timestamp. then in that case the program only needs to look back a few lines at a time? seems like a common problem so i thought I'd ask if there was an example that is close to that or if someone has written something already. P.S. Hadoop Rocks!