Re: help with hadoop program

Mori Bellamy Wed, 09 Jul 2008 17:06:08 -0700

It seems like this problem could be done with one map-reduce task.
From your input, map out (ID,{type,TimeStamp})

in your reduce, you can figure out how many A1's appear close toeachother. one naive approach is to iterate through all of the setsand collect them in some collection class. Then, if your custom setclass implements Comparable, you can just callCollections.sort(myList). i'm sure there are faster solutions (perhapsyou could sort them as you iterate through by hashing based ontimestamp?)


does this answer your question?

On Jul 9, 2008, at 4:59 PM, Elia Mazzawi wrote:

can someone point me to an example i can learn from.

I have a data set that looks like this:

ID    type   Timestamp

A1    X   1215647404
A2    X   1215647405
A3    X   1215647406
A1   Y   1215647409

I want to count how many A1 Y, show up within 5 seconds of an A1 X

I've written a few hadoop programs already but they were based on the
wordcount example. and so only work with 1 line at a time.
This problem requires looking back or remembering state? or more than
one pass?
I was thinking that it is possible to sort the data by ID, timestamp.

then in that case the program only needs to look back a few lines ata time?

seems like a common problem so i thought I'd ask if there was anexample

that is close to that or if someone has written something already.

P.S. Hadoop Rocks!

Re: help with hadoop program

Reply via email to