help with hadoop program

Elia Mazzawi Wed, 09 Jul 2008 17:00:14 -0700

can someone point me to an example i can learn from.

I have a data set that looks like this:


ID    type   Timestamp

A1    X   1215647404
A2    X   1215647405
A3    X   1215647406
A1   Y   1215647409

I want to count how many A1 Y, show up within 5 seconds of an A1 X

I've written a few hadoop programs already but they were based on the
wordcount example. and so only work with 1 line at a time.
This problem requires looking back or remembering state? or more than
one pass?
I was thinking that it is possible to sort the data by ID, timestamp.
then in that case the program only needs to look back a few lines at a time?

seems like a common problem so i thought I'd ask if there was an example
that is close to that or if someone has written something already.

P.S. Hadoop Rocks!

help with hadoop program

Reply via email to