Thanks a lot!
On Tue, Mar 20, 2012 at 7:13,Bejoy Ks <[email protected]> wrote: > Yes, if you are having more than 2 files to be compared against then, the > file name/ id is required from mapper. If it is just two files and you > just want to know which lines are not unique then just the line no would be > good but if you are looking at more granular info like the exact changes in > which all files then the value from mapper could be prefixed with some > value like file name. > > Regards > Bejoy KS > > 2012/3/20 botma lin <[email protected]> > > > Thanks Bejoy, that makes sense . > > > > If I want to know the different record's original file, I need to > > put an extra file id into the mapper's output value, then get it in the > > reducer . > > > > Do you have any other ideas > > > > Thanks!. > > > > > > On Tue, Mar 20, 2012 at 6:09 PM,Bejoy Ks <[email protected]> wrote: > > > > > Hi Lin > > > In you mapper make the line no as the key and the line contents > as > > > the value. In your reducer check whether the two values for a key are > > > matching. ie if you are comparing two files then there would be two > > values > > > for a line number. If non matching patterns found increment a counter > to > > > determine the number of non matching patterns and write those patterns > to > > > output file . If the values matches for a key do nothing, no need even > > > writing to output dir. > > > > > > Regards > > > Bejoy KS > > > > > > On Tue, Mar 20, 2012 at 2:01 PM, botma lin <[email protected]> wrote: > > > > > > > Hi, all > > > > > > > > I'm newbie to hadoop. > > > > > > > > I'm trying to compare two large file and get the difference > > between > > > > them ,like the diff cmd in linux, > > > > however, the mapred api can only get one record at a time . so how > > can > > > I > > > > get the relative records in two files and compare them by using > mapred > > > api. > > > > > > > > thinks! > > > > > > > > > >
