I am wondering what the most efficient way would be handle the following scenario with map reduce in hadoop. Let's say we have the following data

  time=1, ip=1, a=1
  time=2, ip=2, a=2
  time=3, ip=2, b=4
  time=2, ip=1, b=2
  time=4, ip=1, a=4
  time=5, ip=2, a=7
  time=6, ip=1, c=9
  time=7, ip=2, c=11

Which basically represent a timestamp and requests from different IPs providing certain values. Better readable like this:

  time=1, ip=1, a=1
  time=2, ip=1, b=2
  time=4, ip=1, a=4
  time=6, ip=1, c=9

  time=2, ip=2, a=2
  time=3, ip=2, b=4
  time=5, ip=2, a=7
  time=7, ip=2, c=11

I now would like to re-create the state in time of all the different values:

  time=1, ip=1, a=1, [b=0, c=0]
  time=2, ip=2, a=2, [b=0, c=0]
  time=3, ip=2, a=2, b=4, [c=0]
  time=2, ip=1, a=1, b=2, [c=0]
  time=4, ip=1, a=4, b=2, [c=0]
  time=5, ip=2, a=7, b=4, [c=0]
  time=6, ip=1, a=4, b=2, c=9
  time=7, ip=2, a=7, b=4, c=11

[] = implicit default value

Or for better reading:

  time=1, ip=1, a=1, b=0, c=0
  time=2, ip=1, a=1, b=2, c=0
  time=4, ip=1, a=4, b=2, c=0
  time=6, ip=1, a=4, b=2, c=9

  time=2, ip=2, a=2, b=0, c=0
  time=3, ip=2, a=2, b=4, c=0
  time=5, ip=2, a=7, b=4, c=0
  time=7, ip=2, a=7, b=4, c=11

So my fellow map-reduce writers ..how would one tackle this best? Suggestions?

cheers
--
Torsten

Reply via email to