Re: how to implements the 'diff' cmd in hadoop

Bejoy Ks Tue, 20 Mar 2012 03:10:28 -0700

Hi Lin
        In you mapper make the line no as the key and the line contents as
the value. In your reducer check whether the two values for a key are
matching. ie if you are comparing two files then there would be two values
for a line number. If non matching patterns found increment a counter to
determine the number of non matching patterns and write those patterns to
output file . If the values matches for a key do nothing, no need even
writing to output dir.


Regards
Bejoy KS

On Tue, Mar 20, 2012 at 2:01 PM, botma lin <[email protected]> wrote:

> Hi, all
>
>      I'm newbie to hadoop.
>
>      I'm trying to compare two large file and get the difference between
> them ,like the diff cmd in linux,
>  however,  the mapred api can only get one record at a time . so how can I
> get the relative records in two files and compare them by using mapred api.
>
>     thinks!
>

Re: how to implements the 'diff' cmd in hadoop

Reply via email to