On 09/26/2011 11:58 AM, Sharan34140 wrote:
I had this doubt for quite a long time.Could be absurd even but need the
solutions .
How do we compare efficiently compare 2 files each containing terabytes of
record ?
This could be related to external sorting as well.
But couldnt find a efficeint solution to it.
Can somebody please help in understanding how to proceed?
Before proceeding. Can you provide us with more details, like Is
comparison to be done involves line by line comparison of files and
display the diff or Is it a record ?. In either case one might have to
override Fileinputformat which would accept two files in question and
process them line by line or by record. And then in map we can emit the
diff with Record number as key and diff as value. I have not tried this
would be interesting if someone with experience can throw some light.
Thanks
Prashant