Re: Comparing two logs, finding missing records

Mark Kerzner Sun, 26 Jun 2011 19:21:16 -0700

Thank you, Bharath, tomorrow I will get the reaction to my solution from the
actual person who posed the problem for me, and then we will see what
details I might have missed.


Mark

On Sun, Jun 26, 2011 at 8:04 PM, Bharath Mundlapudi
<[email protected]>wrote:

> SQL:
> SELECT * FROM LOG1 LEFT OUTER JOIN LOG2 ON LOG1.recordid = LOG2.recordid;
>
> PIG:
> data = JOIN LOG1 BY recordid LEFT OUTER, LOG2 BY recordid;
> DUMP data;
>
> If you need more PIG help, please post in PIG email alias.
>
> -Bharath
>
> ------------------------------
> *From:* Mark Kerzner <[email protected]>
> *To:* [email protected]; Bharath Mundlapudi <
> [email protected]>
> *Sent:* Sunday, June 26, 2011 5:50 PM
> *Subject:* Re: Comparing two logs, finding missing records
>
> Bharath,
>
> how would a Pig query look like?
>
> Thank you,
> Mark
>
> On Sun, Jun 26, 2011 at 5:12 PM, Bharath Mundlapudi <[email protected]
> > wrote:
>
> If you have Serde or PigLoader for your log format, probably Pig or Hive
> will be a quicker solution with the join.
>
> -Bharath
>
>
>
> ________________________________
> From: Mark Kerzner <[email protected]>
> To: Hadoop Discussion Group <[email protected]>
> Sent: Saturday, June 25, 2011 9:39 PM
> Subject: Comparing two logs, finding missing records
>
> Hi,
>
> I have two logs which should have all the records for the same record_id,
> in
> other words, if this record_id is found in the first log, it should also be
> found in the second one. However, I suspect that the second log is filtered
> out, and I need to find the missing records. Anything is allowed: MapReduce
> job, Hive, Pig, and even a NoSQL database.
>
> Thank you.
>
> It is also a good time to express my thanks to all the members of the group
> who are always very helpful.
>
> Sincerely,
> Mark
>
>
>
>
>

Re: Comparing two logs, finding missing records

Reply via email to