Hi, 1 - I would like to compare programatically the map output and the reduce input to see if they're equal in MR. So, I'm trying to do an hash on the output generated by the map, and on the input on the reduce side and compare them. The problem is that I'm doing the hash to all the file and not to the key/value pair and as result the hash produced in the map side is different from the hash produced in the reduce side.
On the map side, I'm doing an hash to the map output, and on the reduce side, I'm doing an hash on the reduce input file. I don't quite understand why the hashes are different. Should there have any reason? 2 - A possible solution that I would like to do the hash to the key/value pair. So I've to create a method that would me allow to read the hey/value pair of any possible map output. I would like to create a generic method that could read the map outputs that are produced in the map side and print them out, but I can't find any good example to try to build a generic method. I facing some difficulties on knowing how to read the map output files that are written in file or in memory in the map side. Can you give me some example on how can I read a key/value pair that is stored in the disk? 3 - The MR uses class Segments during the sort phase. A Segment correspond to a pair Key/value in a map output? For example, if the mapper produces the following map output file: <A, 1> <B, 2> So, this map output contains 2 segments? Thanks, -- Pedro
