> From: Nathalie Conte [mailto:n...@sanger.ac.uk]
> Sent: Friday, September 30, 2011 9:38 AM
> To: beginners@perl.org
> Subject: parsing script removing some lines help please
> 
> 
> 
> Hi,
> I am lost in my script, and would need to basic help please.
> I have got a file , separated by tabs, and the first column contain a
> chromosome number, then several other column with different infos.
> Basically I am trying to created a script that would take a file(see
> example), parse line by line, and  when the first column start by any
> of
> the chromosomes I don't want (6,8,14,16,18,Y), go the next line, and if
> it doesn't start by the bad chromosomes , print all the line to a new
> output file.
> the script below, just reprint the same original file :(
> thanks for any clues
> Nat
> 
> 
> 
> #!/software/bin/perl
> use warnings;
> use strict;
> open(IN, "<example.txt") or die( $! );
> open(OUT, ">>removed.txt") or die( $! );
> my @bad_chromosome=(6,8,14,16,18,Y);
> while(<IN>){
>     chomp;
>     my @column=split /\t/;
>         foreach my $chr_no(@bad_chromosome){
>             if ($column[0]==$chr_no){
>             next;
>             }
>             }
>             print OUT
> $column[0],"\t",$column[1],"\t",$column[2],"/",$column[3],"\t",$column[
> 4],"\t",$column[5],"\t",$column[6],"\t",$column[7],"\t",$column[8],"\t"
> ,$column[9],"\t",$column[10],"\t",$column[11],"\t",$column[12],"\t",$co
> lumn[13],"\t",$column[14],"\n";
>             }
> 
> 
> 
> close IN; close OUT;
> 
John has provided good advice on this problem, but I wanted to add a couple
of things.
To avoid explicitly coding the foreach loop for @bad_chromosome, you could
use the 'grep' function.
Also, if you are just reprinting the input line, print $_.

unless ( grep {$column[0] eq $_} @bad_chromosome ){
   print OUT "$_\n";  #  or print $OUT if declared as John suggested

The grep call will return the number of times $column[0] matched an element
of @bad_chromosome.
Thus, if there is a match the grep call will evaluate to 'true'. Otherwise,
it will evaluate to 'false'.

Using grep does have a drawback (but not that much unless you have a lot of
values in @bad_chromosome). It checks all the values of @bad_chromosome for
a match. Using the 'if ... next' stops looking for a match when a match is
found.

If you wonder about the use of $_ in the grep function - that is a localized
copy of $_ and does not affect the $_ that contains the data read from the
file.

If you are using Perl 5.10 or higher, you can use the 'smart match'
operators instead of grep.

HTH, Ken









-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to