rubinsta wrote: > Hello, > > I'm a Perl uber-novice and I'm trying to compare two files in order to > exclude items listed on one file from the complete list on the other > file. What I have so far prints out a third file listing everything > that matches the exclude file from the complete file (which I'm hoping > will be a duplicate of the exclude file) just so I can make sure that > the comparison script is working. The files are lists of numbers > separated by newlines. The exclude file has 333 numbers and the > complete file has 9000 numbers. > > Here's what I have so far: > > #!/usr/bin/perl > use strict; > use warnings; > > open(ALL, "all.txt") or die $!; > open(EX, "exclude.txt") or die $!; > open(OUT,'>exTest.txt') or die $!; > > my @ex_lines = <EX>; > my @all_lines = <ALL>; > > foreach $all (@all_lines){ > foreach $ex (@ex_lines){ > if ($ex =~ /(^$all)/){
The lines you have read from the object files are unchomped (include the trailing newline character) and there is no allowance for leading or trailing whitespace. Are you sure of your input data? The regex has an unnecessary capture (parentheses) and isn't tied at the end of the string, although leaving the record separator at the end of $ex and $all has a similar effect. It should really be simply if ($ex eq $all) > print OUT $1; The two strings are equal, so print OUT $all; > } > } > } > close(ALL); > close(EX); Explicit closures are pointless unless the status is verified. All open filehandles will be closed by Perl when it finishes processing the script. (Even if an input file doesn't close cleanly, the damage has already been done when an earlier read failed. If a volume is dismounted while the program is running, for example, without explicit handling of read errors the file will simply appear to be shorter than its true length.) > close(OUT); There's no need to close output files unless you're in a fragile environment, or if it is vital that the output information is complete. For instance it may be useful to write close $output or die $!; unlink 'input.txt'; so that the object data was discarded only if the target data was safely written and secured. > I realize the nested foreach loops are ugly but I don't know enough to > navigate the filehandles, which as I understand, can only be assigned > to variables in their entirety as an array. Any thoughts on how this > might be done? You should try to solve the problem instead of solving the data. Nearly all of your code is about opening, reading, and closing files. Your solution amounts to: if any of the lines in ALL match any of the lines in EX then print (it) Given just the idea of the data, can you improve on that? For instance, if one or both of the object files are sorted then you may not need to reassess all of the lines for each comparison. Or if the lines could occur more than once in either or both files, then it may be an idea to maintain a record of what comparisons had already been made. Those ideas are independent of Perl, or indeed of any programming language. After that, the line blurs. Programming languages are useful thinking tools for imagining programming solutions, just as natural languages are useful for life's challenges. An idea expressed in Latin can be impossible to recreate intact in French, just a solution in Forth can be inexpressible in C++. But despite its blurriness the line is narrow, so have courage and dash cross it into the implementation, where all languages have ways to open, close, read and write files; ways to handle numbers and strings; conveniences for arrays and constants and, God forbid, error handling. But I encourage you to start at the beginning, and if common sense is more familiar to you than Perl or any other programming language then use that. Your imagination is your best tool. If you were given two piles of line printer paper and were told to find the differences: - what questions would you ask about the problem? - how would you go about it? - what would you want to know about the contents? Once you know the answers, you have a solution. Then you can code it, given knowledge of the language at hand. Many things will change the solution, just as you would do things differently if you had only two sheets of paper to compare, or a two-inch-thick stack. Whether you had to do it every day or it was somebody else's turn in ten years' time. Whether it was obvious that all of the lines on one stack of paper were the same except for a few changes. You get the idea? But unless it is easier for you to formulate solutions in Perl or any other language, then imagine a real-world equivalent and use common sense. Then just code it, and we will help. HTH, Rob -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/