On 09/26/2011 11:58 AM, Sharan34140 wrote:
I had this doubt for quite a long time.Could be absurd even but need the
solutions .
How do we compare efficiently compare 2 files each containing terabytes of
record ?
This could be related to external sorting as well.
But couldnt find a efficeint solution to it.
Can somebody please help in understanding how to proceed?
Before proceeding. Can you provide us with more details, like Is comparison to be done involves line by line comparison of files and display the diff or Is it a record ?. In either case one might have to override Fileinputformat which would accept two files in question and process them line by line or by record. And then in map we can emit the diff with Record number as key and diff as value. I have not tried this would be interesting if someone with experience can throw some light.

Thanks
Prashant

Reply via email to