SQL: SELECT * FROM LOG1 LEFT OUTER JOIN LOG2 ON LOG1.recordid = LOG2.recordid;
PIG: data = JOIN LOG1 BY recordid LEFT OUTER, LOG2 BY recordid; DUMP data; If you need more PIG help, please post in PIG email alias. -Bharath ________________________________ From: Mark Kerzner <[email protected]> To: [email protected]; Bharath Mundlapudi <[email protected]> Sent: Sunday, June 26, 2011 5:50 PM Subject: Re: Comparing two logs, finding missing records Bharath, how would a Pig query look like? Thank you, Mark On Sun, Jun 26, 2011 at 5:12 PM, Bharath Mundlapudi <[email protected]> wrote: If you have Serde or PigLoader for your log format, probably Pig or Hive will be a quicker solution with the join. > >-Bharath > > > >________________________________ >From: Mark Kerzner <[email protected]> >To: Hadoop Discussion Group <[email protected]> >Sent: Saturday, June 25, 2011 9:39 PM >Subject: Comparing two logs, finding missing records > > >Hi, > >I have two logs which should have all the records for the same record_id, in >other words, if this record_id is found in the first log, it should also be >found in the second one. However, I suspect that the second log is filtered >out, and I need to find the missing records. Anything is allowed: MapReduce >job, Hive, Pig, and even a NoSQL database. > >Thank you. > >It is also a good time to express my thanks to all the members of the group >who are always very helpful. > >Sincerely, >Mark
