On 2012.01.26 12:59, Chris Stinemetz wrote:
Hello Tiago,

On Thu, Jan 26, 2012 at 11:08 AM, Tiago Hori<tiago.h...@gmail.com>  wrote:
Hi All,

I need some help to get started on a script.

I have these huge data files 16K rows and several columns. I need to parse
the rows into a subset of these 16K rows. Each rows has a identifier made
up of 2 letters and 6 numbers and the ones I want have specific letter,
they start with either C or D. So I know I can use regex, but I have been
trying to figure out the rest and I don't know where to start. This is the
first time I am trying to do something from scratch so any suggestions
would be appreciated. I am not asking for the script but just some help on
how to go about it.

So, what I want to be able to do is retrieve all the rows that have
identifiers starting with C or D. Should I use arrays, can I store each row
as one item a (set of information separated by tabs) in an array?


Yes I would use an array to store the data and then use regex to
extract the rows based on your criteria.

I put together a little sample program using fictitious data. You
should be able to apply the same concept to your needs.

***tested***

#!/usr/bin/perl
use warnings;
use strict;

while (<DATA>  ) {
   chomp;
   my @array = split;
   my $GeneID = $array[6];

   if ($GeneID =~ /^C|D/) {
     print $_,"\n";
   }
}

__DATA__
Line1 c 2 3 4 5 C 7 8 9
Line2 1 2 3 4 5 6 7 8 9
Line3 1 2 3 4 5 D 7 8 9
Line4 1 2 3 4 5 6 7 8 9
Line5 1 2 3 4 5 D 7 8 9
Line6 1 2 3 4 5 6 7 8 9

***output***
Line1 c 2 3 4 5 C 7 8 9
Line3 1 2 3 4 5 D 7 8 9
Line5 1 2 3 4 5 D 7 8 9

This code erases the contents of @array and creates a new instance on each loop of while(). If I'm not mistaken, the OP needs to save the subset outside of the loop. I've modified your code a bit:

#!/usr/bin/perl
use warnings;
use strict;

my @array;

while ( <DATA> ) {

  chomp;

  my $GeneID = ( split( /\s+/, $_ ))[6];

  if ($GeneID =~ /^C|D/) {
    push( @array, $_ );
  }
}

print "$_\n" for @array;

__DATA__
Line1 c 2 3 4 5 C 7 8 9
Line2 1 2 3 4 5 6 7 8 9
Line3 1 2 3 4 5 D 7 8 9
Line4 1 2 3 4 5 6 7 8 9
Line5 1 2 3 4 5 D 7 8 9
Line6 1 2 3 4 5 6 7 8 9

Note that the shift() line does the same thing yours does, but eliminates the step of having to assign to a temporary array variable. All of the logic could technically be shortened to:

  if ( ( split( /\s+/, $_ ))[6] =~ /^[CD]/) {
    push( @array, $_ );
  }

...but oftentimes brevity is not nice on the eyes, especially in a program longer than five or six lines :)

Steve

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to