At 2:12 AM -0700 7/16/09, Amit Saxena wrote:
Hi all,
I need help regarding the approach to find out matched and unmatched entries
between two files using perl.
As the number of lines in the files would be around 10k-50k, I don't want to
load entire file contents into memory.
The fastest approach is usually to load the shorter of the two files
into memory, then read the longer of the two files and process each
line, recording whether the line matches any record in the shorter
file. A hash is best for this method. 50k files should be no problem.
If you really don't or can't read one of the files into memory, then
a method that still requires only one pass over each of the two files
is to sort the files and save the sorted copies. Then, read one line
from each file and compare. If they are equal, record this fact and
read two more lines. If they do not match, record the fact and read a
line from the file with the lessor of the two line, alphabetically
speaking, then compare again.
--
Jim Gibson
j...@gibson.org
--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/