Finding out matching and not matching entries between two files !

Amit Saxena Thu, 16 Jul 2009 02:12:36 -0700

Hi all,

I need help regarding the approach to find out matched and unmatched entries
between two files using perl.


As the number of lines in the files would be around 10k-50k, I don't want to
load entire file contents into memory.

The first file (file1 also known as superset file) contains all the data in
4 columns in a format like country, state, city and id. The second file
(file2

also known as subset file) contains some of the data from superset file with
additional condition that it does not contains all 4 columns. Instead it

contains 3 columns only.

The following information is needed from these input files
1. Matched file . which lists the contents of the superset file which
matches the contents of subset file.
2. Unmatched file .given all the ids for the country - state pair from the
subset file, list down all the rows from the superset file which contains
the same

country - state pair but none of those ids. The sample files are shown
below.

File 1 (Superset)

Country1,State1,City111,id1
Country1,State1,City112,id2
Country1,State1,City113,id3
Country1,State1,City114,id4
Country1,State1,City115,id5
Country1,State2,City121,id6
Country1,State2,City122,id7
Country1,State2,City123,id8
Country1,State3,City131,id9
Country1,State3,City132,id10


File 2 (subset)

Country1,State1,City111
Country1,State1,City112
Country1,State2,City121
Country1,State3,City131


Matched file
------------

Country1,State1,City111,id1
Country1,State1,City112,id2
Country1,State2,City121,id6
Country1,State3,City131,id9


Unmatched file
--------------


Country1,State1,City113,id3
Country1,State1,City114,id4
Country1,State1,City115,id5
Country1,State2,City122,id7
Country1,State2,City123,id8
Country1,State3,City132,id10


As of now, I am reading the subset file line by line and then once there is
a difference in country and state pair, I find out all records in superset
file

which satisfies matching and unmatching condition.

Please suggest a better approach for the same.

Thanks & Regards,
Amit Saxena

Finding out matching and not matching entries between two files !

Reply via email to