Re: Finding out matching and not matching entries between two files !

Jim Gibson Thu, 16 Jul 2009 07:19:14 -0700

At 2:12 AM -0700 7/16/09, Amit Saxena wrote:

Hi all,


I need help regarding the approach to find out matched and unmatched entries
between two files using perl.

As the number of lines in the files would be around 10k-50k, I don't want to
load entire file contents into memory.

The fastest approach is usually to load the shorter of the two filesinto memory, then read the longer of the two files and process eachline, recording whether the line matches any record in the shorterfile. A hash is best for this method. 50k files should be no problem.

If you really don't or can't read one of the files into memory, thena method that still requires only one pass over each of the two filesis to sort the files and save the sorted copies. Then, read one linefrom each file and compare. If they are equal, record this fact andread two more lines. If they do not match, record the fact and read aline from the file with the lessor of the two line, alphabeticallyspeaking, then compare again.


--
Jim Gibson
j...@gibson.org

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: Finding out matching and not matching entries between two files !

Reply via email to