Thanks Bejoy, that makes sense .
If I want to know the different record's original file, I need to
put an extra file id into the mapper's output value, then get it in the
reducer .
Do you have any other ideas
Thanks!.
On Tue, Mar 20, 2012 at 6:09 PM,Bejoy Ks <[email protected]> wrote:
> Hi Lin
> In you mapper make the line no as the key and the line contents as
> the value. In your reducer check whether the two values for a key are
> matching. ie if you are comparing two files then there would be two values
> for a line number. If non matching patterns found increment a counter to
> determine the number of non matching patterns and write those patterns to
> output file . If the values matches for a key do nothing, no need even
> writing to output dir.
>
> Regards
> Bejoy KS
>
> On Tue, Mar 20, 2012 at 2:01 PM, botma lin <[email protected]> wrote:
>
> > Hi, all
> >
> > I'm newbie to hadoop.
> >
> > I'm trying to compare two large file and get the difference between
> > them ,like the diff cmd in linux,
> > however, the mapred api can only get one record at a time . so how can
> I
> > get the relative records in two files and compare them by using mapred
> api.
> >
> > thinks!
> >
>