Oops!  

Try changing

   if($find =~ /^>probe:\w+:(\w+):/)

to

   if($line =~ /^>probe\:\w+\:$find\:/) {



I can't remember if you have to escape colons or not.  If you do, then
you're probably a pearlfish.



-----Original Message-----
From: Michael Oldham [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, June 13, 2006 7:13 PM
To: Timothy Johnson; [EMAIL PROTECTED] Org
Subject: RE: A loop to parse a large text file--output is empty!

Thanks Timothy.  I tried the code you supplied and unfortunately the
output file is still empty.  Do you think there might be a problem with
the regular expression in:

if($find =~ /^>probe:\w+:(\w+):/)

?

Mike



-----Original Message-----
From: Timothy Johnson [mailto:[EMAIL PROTECTED]
Sent: Tuesday, June 13, 2006 6:59 PM
To: Michael Oldham; [EMAIL PROTECTED] Org
Subject: RE: A loop to parse a large text file--output is empty!




One problem is that you are using the $_ variable twice.
"while(<FILE>)" assigns $_ to the current line being read, and
"foreach(@array)" assigns $_ to the current element of the array in
question.

It's usually a good idea to be more explicit anyway, and keep the $_
usage to a minimum so you don't have to worry about this kind of thing.

Also, I'm not sure what you're trying to accomplish by this line:

   print OUT scalar(<PROBES>);

As far as I can see, you're grabbing the next line, assigning it to $_
(maybe), and printing it out in scalar context.  I'm assuming that you
actually wanted to print the line you read instead, so that's what I
did.

Try this and see if it is closer to what you want:

###################

#!/usr/bin/perl -w

use strict;

my $IDs = 'ID.txt';

unless (open(IDFILE, $IDs)) {
        print "Could not open file $IDs!\n";
        }

my $probes = 'HG_U95Av2_probe_fasta.txt';

unless (open(PROBES, $probes)) {
        print "Could not open file $probes!\n";
        }

open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";

my @ID = <IDFILE>;
chomp @ID;

#            vvvvvvvvvvvvvvvvvvv
        while (my $line = <PROBES>) {
#                   vvvvvvvv
                foreach my $find(@ID) {
                        if($line =~ /^>probe:\w+:$find:/) {
                                print OUT $find."\n";
#                                 VVVVVV
                                print OUT $line."\n";
                        }
                }

        }
exit;

###########################

-----Original Message-----
From: Michael Oldham [mailto:[EMAIL PROTECTED]
Sent: Tuesday, June 13, 2006 6:42 PM
To: [EMAIL PROTECTED] Org
Subject: A loop to parse a large text file--output is empty!

Dear all,

I am a Perl newbie struggling to accomplish a conceptually simple
bioinformatics task.  I have a single large file containing about
200,000 DNA probe sequences (from an Affymetrix microarray), each of
which is accompanied by a header, like so (this is in FASTA format):

>probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
Antisense;
TGGCTCCTGCTGAGGTCCCCTTTCC
>probe:HG_U95Av2:107_at:543:519; Interrogation_Position=258; Antisense;
CTACTCTCGTGGTGCACAAGGAGTG
>probe:HG_U95Av2:1156_at:528:483; Interrogation_Position=2054;
Antisense;
TGCAGGTGGCAGATCTGCAGTCCAT
>probe:HG_U95Av2:1102_s_at:541:589; Interrogation_Position=4316;
Antisense;
GTGAAGGTTGCTGAGGCTCTGACCC

.........etc.

What I would like to do is extract from this file a subset of ~130,800
probes (both the header and the sequence) and output this subset into a
new text file in the same (FASTA) format.  These 130,800 probes
correspond to 8,175 probe set IDs ("1138_at" is an example of a probe
set ID in the header listed above).  I have these 8,175 IDs listed in a
separate file called "ID.txt" and the 200,000 probe sequences in a file
called "HG_U95Av2_probe_fasta.txt".  The script below is missing
something because the output file ("probe_subset.txt") is blank.  This
is also the case if I replace the file "ID.txt" with a file consisting
of a single probe set ID (e.g. 1138_at).  Does anyone know what I am
missing?  I am running this script in Cygwin on Windows XP.  I
appreciate any suggestions!

~ Mike O.

#!/usr/bin/perl -w

use strict;

my $IDs = 'ID.txt';

unless (open(IDFILE, $IDs)) {
        print "Could not open file $IDs!\n";
        }

my $probes = 'HG_U95Av2_probe_fasta.txt';

unless (open(PROBES, $probes)) {
        print "Could not open file $probes!\n";
        }

open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";

my @ID = <IDFILE>;
chomp @ID;

        while (<PROBES>) {
                foreach (@ID) {
                        if(/^>probe:\w+:(\w+):/) {
                                print OUT;
                                print OUT scalar(<PROBES>);
                        }
                }

        }
exit;
--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.394 / Virus Database: 268.8.4/363 - Release Date: 6/13/2006


--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>



--
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.1.394 / Virus Database: 268.8.4/363 - Release Date: 6/13/2006

--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.394 / Virus Database: 268.8.4/363 - Release Date: 6/13/2006



--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to