Interesting, Bharath, I will look at these. Mark
On Sun, Jun 26, 2011 at 5:12 PM, Bharath Mundlapudi <[email protected]>wrote: > If you have Serde or PigLoader for your log format, probably Pig or Hive > will be a quicker solution with the join. > > -Bharath > > > > ________________________________ > From: Mark Kerzner <[email protected]> > To: Hadoop Discussion Group <[email protected]> > Sent: Saturday, June 25, 2011 9:39 PM > Subject: Comparing two logs, finding missing records > > Hi, > > I have two logs which should have all the records for the same record_id, > in > other words, if this record_id is found in the first log, it should also be > found in the second one. However, I suspect that the second log is filtered > out, and I need to find the missing records. Anything is allowed: MapReduce > job, Hive, Pig, and even a NoSQL database. > > Thank you. > > It is also a good time to express my thanks to all the members of the group > who are always very helpful. > > Sincerely, > Mark >
