If you have Serde or PigLoader for your log format, probably Pig or Hive will be a quicker solution with the join.
-Bharath ________________________________ From: Mark Kerzner <[email protected]> To: Hadoop Discussion Group <[email protected]> Sent: Saturday, June 25, 2011 9:39 PM Subject: Comparing two logs, finding missing records Hi, I have two logs which should have all the records for the same record_id, in other words, if this record_id is found in the first log, it should also be found in the second one. However, I suspect that the second log is filtered out, and I need to find the missing records. Anything is allowed: MapReduce job, Hive, Pig, and even a NoSQL database. Thank you. It is also a good time to express my thanks to all the members of the group who are always very helpful. Sincerely, Mark
