At 2:12 AM -0700 7/16/09, Amit Saxena wrote:
Hi all,

I need help regarding the approach to find out matched and unmatched entries
between two files using perl.

As the number of lines in the files would be around 10k-50k, I don't want to
load entire file contents into memory.


The fastest approach is usually to load the shorter of the two files into memory, then read the longer of the two files and process each line, recording whether the line matches any record in the shorter file. A hash is best for this method. 50k files should be no problem.

If you really don't or can't read one of the files into memory, then a method that still requires only one pass over each of the two files is to sort the files and save the sorted copies. Then, read one line from each file and compare. If they are equal, record this fact and read two more lines. If they do not match, record the fact and read a line from the file with the lessor of the two line, alphabetically speaking, then compare again.

--
Jim Gibson
j...@gibson.org

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to