can someone point me to an example i can learn from.

I have a data set that looks like this:

ID    type   Timestamp

A1    X   1215647404
A2    X   1215647405
A3    X   1215647406
A1   Y   1215647409

I want to count how many A1 Y, show up within 5 seconds of an A1 X

I've written a few hadoop programs already but they were based on the
wordcount example. and so only work with 1 line at a time.
This problem requires looking back or remembering state? or more than
one pass?
I was thinking that it is possible to sort the data by ID, timestamp.
then in that case the program only needs to look back a few lines at a time?

seems like a common problem so i thought I'd ask if there was an example
that is close to that or if someone has written something already.

P.S. Hadoop Rocks!

Reply via email to