Thank you, Bharath, tomorrow I will get the reaction to my solution from the actual person who posed the problem for me, and then we will see what details I might have missed.
Mark On Sun, Jun 26, 2011 at 8:04 PM, Bharath Mundlapudi <[email protected]>wrote: > SQL: > SELECT * FROM LOG1 LEFT OUTER JOIN LOG2 ON LOG1.recordid = LOG2.recordid; > > PIG: > data = JOIN LOG1 BY recordid LEFT OUTER, LOG2 BY recordid; > DUMP data; > > If you need more PIG help, please post in PIG email alias. > > -Bharath > > ------------------------------ > *From:* Mark Kerzner <[email protected]> > *To:* [email protected]; Bharath Mundlapudi < > [email protected]> > *Sent:* Sunday, June 26, 2011 5:50 PM > *Subject:* Re: Comparing two logs, finding missing records > > Bharath, > > how would a Pig query look like? > > Thank you, > Mark > > On Sun, Jun 26, 2011 at 5:12 PM, Bharath Mundlapudi <[email protected] > > wrote: > > If you have Serde or PigLoader for your log format, probably Pig or Hive > will be a quicker solution with the join. > > -Bharath > > > > ________________________________ > From: Mark Kerzner <[email protected]> > To: Hadoop Discussion Group <[email protected]> > Sent: Saturday, June 25, 2011 9:39 PM > Subject: Comparing two logs, finding missing records > > Hi, > > I have two logs which should have all the records for the same record_id, > in > other words, if this record_id is found in the first log, it should also be > found in the second one. However, I suspect that the second log is filtered > out, and I need to find the missing records. Anything is allowed: MapReduce > job, Hive, Pig, and even a NoSQL database. > > Thank you. > > It is also a good time to express my thanks to all the members of the group > who are always very helpful. > > Sincerely, > Mark > > > > >
