----- Original Message ----- From: "Andrej Kastrin" <[EMAIL PROTECTED]>
Newsgroups: perl.beginners
To: "Perl Beginners List" <beginners@perl.org>
Sent: Wednesday, December 07, 2005 12:00 PM
Subject: Extract text from file


Hello dears,

I have a file in row data format, which stores different terms (e.g. genes) and look like:
------------
ABH
HD
HDD
etc.
------------

Then I have second file which looks like:
--------------------------------------------------------------
ID-  001 #ID number
TI-   analysis of HD patients. #title of article
AB- The present article deals with HD patients. #abstract

ID-  002 #ID number
TI-   In reply to analysis of HD patients. #title of article
AB- The present article deals with HDD patients. #abstract
--------------------------------------------------------------
etc., where the separator between records is blank line.

Now I have to extract those ID, TI and AB fields from the second file, which involves any term in the first file.

Colleague from BioPerl mailing list helps me with the following code:

#!/usr/bin/perl

use strict;
use warnings;

my $file_terms = shift;
my $file_medline = shift;
open (TERM, $file_term) or die "Can't open TERM"; #open list of terms
open (MEDL, $file_medline) or die "Can't open MEDL"; #open records file

my @terms = <TERM>;

chomp(my @terms = <TERM>);


while (my ($pmid, $ti, $ab) = split <MEDL>) {

This line doesn't work. split takes the form: split /PATTERN/,EXPR
Even if you had split stated correctly, it will not give you $pmid, $ti, $ab in this program.
See: perldoc -f split

for my $term (@terms) {
if (/$term/ for ($pmid, $ti, $ab)) {

You can't use a 'for' loop as an expression for an if statement.

print "$pmid\t$ti\t$ab";
}
}
}

I'm little confused now, while above example doesn't work and I don't know why (compilation error in 15th and 19th line).
I'm still learning...

Thanks for any suggestion, Andrej

I think the program below will give the results you want. Also, it leaves the second file, $file_medline, in its original format when printed out. Don't know if you really want to have the output lines tab separated as in your output.

#!/usr/bin/perl
use strict;
use warnings;

open TERM, "o33.txt" or die $!;
chomp(my @terms = <TERM>);
close TERM or die $!;

open MEDL, "o44.txt" or die $!;

{ # enclose these statements in a block so that change to $/ is confined to these statements
    local $/ = "\n\n";    # set input record separator to 1 'blank line'
    while (<MEDL>) {
         for my $term (@terms) {
            if (/$term/) {
               print;
last; # get out of 'for' loop when the first term is found - no need to check the rest
            }
        }
    }
}
close MEDL or die $!;


Chris


--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to