Hi all, I need help regarding the approach to find out matched and unmatched entries between two files using perl.
As the number of lines in the files would be around 10k-50k, I don't want to load entire file contents into memory. The first file (file1 also known as superset file) contains all the data in 4 columns in a format like country, state, city and id. The second file (file2 also known as subset file) contains some of the data from superset file with additional condition that it does not contains all 4 columns. Instead it contains 3 columns only. The following information is needed from these input files 1. Matched file . which lists the contents of the superset file which matches the contents of subset file. 2. Unmatched file .given all the ids for the country - state pair from the subset file, list down all the rows from the superset file which contains the same country - state pair but none of those ids. The sample files are shown below. File 1 (Superset) Country1,State1,City111,id1 Country1,State1,City112,id2 Country1,State1,City113,id3 Country1,State1,City114,id4 Country1,State1,City115,id5 Country1,State2,City121,id6 Country1,State2,City122,id7 Country1,State2,City123,id8 Country1,State3,City131,id9 Country1,State3,City132,id10 File 2 (subset) Country1,State1,City111 Country1,State1,City112 Country1,State2,City121 Country1,State3,City131 Matched file ------------ Country1,State1,City111,id1 Country1,State1,City112,id2 Country1,State2,City121,id6 Country1,State3,City131,id9 Unmatched file -------------- Country1,State1,City113,id3 Country1,State1,City114,id4 Country1,State1,City115,id5 Country1,State2,City122,id7 Country1,State2,City123,id8 Country1,State3,City132,id10 As of now, I am reading the subset file line by line and then once there is a difference in country and state pair, I find out all records in superset file which satisfies matching and unmatching condition. Please suggest a better approach for the same. Thanks & Regards, Amit Saxena