It seems like this problem could be done with one map-reduce task.
From your input, map out (ID,{type,TimeStamp})
in your reduce, you can figure out how many A1's appear close to
eachother. one naive approach is to iterate through all of the sets
and collect them in some collection class. Then, if your custom set
class implements Comparable, you can just call
Collections.sort(myList). i'm sure there are faster solutions (perhaps
you could sort them as you iterate through by hashing based on
timestamp?)
does this answer your question?
On Jul 9, 2008, at 4:59 PM, Elia Mazzawi wrote:
can someone point me to an example i can learn from.
I have a data set that looks like this:
ID type Timestamp
A1 X 1215647404
A2 X 1215647405
A3 X 1215647406
A1 Y 1215647409
I want to count how many A1 Y, show up within 5 seconds of an A1 X
I've written a few hadoop programs already but they were based on the
wordcount example. and so only work with 1 line at a time.
This problem requires looking back or remembering state? or more than
one pass?
I was thinking that it is possible to sort the data by ID, timestamp.
then in that case the program only needs to look back a few lines at
a time?
seems like a common problem so i thought I'd ask if there was an
example
that is close to that or if someone has written something already.
P.S. Hadoop Rocks!