Michael Oldham am Freitag, 23. Juni 2006 18:20:
> Hello again,

Hi Michael again,

> Thanks to everyone for their helpful suggestions.  I finally got it to
> work, using the following script.  However, it takes about 5 hours to
> run on a fast computer.  Using grep (in bash), on the other hand, takes
> about 5 minutes (see below if you are interested).  Thanks again!
>
> SLOW perl script:
>
> #!/usr/bin/perl -w
>
> use strict;
>
> my $IDs = 'ID_all_X';
>
> unless (open(IDFILE, $IDs)) {
>       print "Could not open file $IDs!\n";
>       }
>
> my $probes = 'HG_U95Av2_probe_fasta';
>
> unless (open(PROBES, $probes)) {
>       print "Could not open file $probes!\n";
>       }
>
> open (OUT,'>','probe_subset.txt') or die "Can't write output: $!";
>
> my @ID = <IDFILE>;
> print @ID;
> chomp @ID;
>
> while (my $line = <PROBES>) {
>       foreach my $identifier (@ID) {
>               if($line=~/^>probe:\w+:$identifier:/) {
>                               print OUT $line;
>                               print OUT scalar(<PROBES>);
>               }
>       }
> }
> exit;
[...]


here's a skeleton how I would do it. It should run quite fast.
The creation of the lookup hash from file is left off as well
as the usage of a real input/output data files.

Hope this helps,
Dani


#!/usr/bin/perl
use strict;
use warnings;

# just a dummy, must be built from file.
#
my %lookup=map {$_=>1} ('1138_at','1102_s_at');

# This is the only regex we have to use. 
# Maybe split is faster, didn't test it.
#
my $re=qr/^.*?:.*?:(.*?):/;

# extract the target string for the selection test,
# test, and print if test ok
#
while (<DATA>) {
   print $_ if ~/$re/ and exists $lookup{$1};
}

__DATA__
>probe:HG_U95Av2:1138_at:395:301; 
Interrogation_Position=2631;Antisense;TGGCTCCTGCTGAGGTCCCCTTTCC
>probe:HG_U95Av2:107_at:543:519; Interrogation_Position=258; 
Antisense;CTACTCTCGTGGTGCACAAGGAGTG
>probe:HG_U95Av2:1156_at:528:483; 
Interrogation_Position=2054;Antisense;TGCAGGTGGCAGATCTGCAGTCCAT
>probe:HG_U95Av2:1102_s_at:541:589; 
Interrogation_Position=4316;Antisense;GTGAAGGTTGCTGAGGCTCTGACCC


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to