I am wondering what the most efficient way would be handle the
following scenario with map reduce in hadoop. Let's say we have the
following data
time=1, ip=1, a=1
time=2, ip=2, a=2
time=3, ip=2, b=4
time=2, ip=1, b=2
time=4, ip=1, a=4
time=5, ip=2, a=7
time=6, ip=1, c=9
time=7, ip=2, c=11
Which basically represent a timestamp and requests from different IPs
providing certain values. Better readable like this:
time=1, ip=1, a=1
time=2, ip=1, b=2
time=4, ip=1, a=4
time=6, ip=1, c=9
time=2, ip=2, a=2
time=3, ip=2, b=4
time=5, ip=2, a=7
time=7, ip=2, c=11
I now would like to re-create the state in time of all the different
values:
time=1, ip=1, a=1, [b=0, c=0]
time=2, ip=2, a=2, [b=0, c=0]
time=3, ip=2, a=2, b=4, [c=0]
time=2, ip=1, a=1, b=2, [c=0]
time=4, ip=1, a=4, b=2, [c=0]
time=5, ip=2, a=7, b=4, [c=0]
time=6, ip=1, a=4, b=2, c=9
time=7, ip=2, a=7, b=4, c=11
[] = implicit default value
Or for better reading:
time=1, ip=1, a=1, b=0, c=0
time=2, ip=1, a=1, b=2, c=0
time=4, ip=1, a=4, b=2, c=0
time=6, ip=1, a=4, b=2, c=9
time=2, ip=2, a=2, b=0, c=0
time=3, ip=2, a=2, b=4, c=0
time=5, ip=2, a=7, b=4, c=0
time=7, ip=2, a=7, b=4, c=11
So my fellow map-reduce writers ..how would one tackle this best?
Suggestions?
cheers
--
Torsten