Re: Extracting Columns from tab delimited files

Jim Gibson Mon, 11 Feb 2013 07:50:38 -0800

On Feb 10, 2013, at 5:57 PM, Tiago Hori wrote:

> Hi All,
> 
> I am trying to force myself to not use one of perl's modules to parse tab
> delimited files (like TXT::CSV), so please be patient and don't tell me
> just to go and use them. I am trying to re-ivent the wheel, so to speak,
> because as we do with science, we repeat experiments to lean about the
> process even tough we know the outcome.


Coding your own solutions rather than using a module already built for the same 
purpose is perfectly all right, especially if you are learning Perl. If you are 
confident your data has a simple format and will not change, then you can parse 
it yourself. Keep in mind, however, that the Text::CSV module can handle more 
complicated cases. For example, what if your data fields can contain the 
separator character? In that case, your data fields may be enclosed in quotes 
or the embedded separator characters will have to be escaped (e.g., preceded by 
a '\' character or some other means.) The Text::CSV module can handle these 
cases, plus it can read from a file or a scalar and deal with broken lines and 
other complexities. There is also the Text::CSV::XS module which includes C 
code for speed.

> 
> So I started by putting reading in the files and go one line at time,
> putting those line in arrays and matching a specific line of interest. With
> join I could then turn the array of interest in a scalar and print that
> out. That is almost what I wanted (see code below):
> 
> #! /usr/bin/perl
> use strict;
> use warnings;
> 
> my $filename_data = $ARGV[0];
> my $filename_target = $ARGV[1];
> my $line_number = 1;
> my @targets;
> 
> open FILE, "<", $filename_data or die $!;
> open TARGET, "<", $filename_target or die $!;

Lexical file handles are generally better, and it helps to include the file 
name in the error message:

open(my $file, '<', $filename_data) or 
  die( "Can't open $filename_data for reading: $!");

> 
> while (<TARGET>){
>    push (@targets, $_);
> }

You can replace the above with:

my @targets = <TARGET>;

You can also do this to remove the line ending characters from @targets:

chomp(@targets);

> 
> close (TARGET);
> 
> while (<FILE>){
>    chomp;
>    my $line = $_;

You can read directly into a scalar, so no need for the $_ variable here:

while( my $line = <FILE> ) {
  chomp($line);

>    my @elements = split ("\t", $line);
>    my $row_name = $elements[0];
>    if ($line_number == 1){
> my $header = join("\t", @elements);

You are splitting $line, then joining it back up in $header. Why not just
$header = $line;

> print $header, "\n";
> $line_number = 2;}
>    elsif($line_number = 2){

That should be 

    elsif( $line_number == 2 ) {

>          foreach (@targets){
>      chomp;
>              my $target = $_;
>              if ($row_name eq $target){
>  my $data = join("\t", @elements);
>          print $data,"\n";

Once again, just use $line.

>      }
>  }
>    }
> }
> 
> close (FILE);
> 
> Realistic, I don't want the whole row. So I started thinking about how to
> get specific columns. I started reading on the internet and the ideas seems
> to be placing the arrays containing the lines in a hash indexed by the row
> names. So I did this:

There are several ways to extract individual columns from a CSV line.

1. You can split the line into an array and make copies of specific elements:

my @fields = split("\t",$line);
my $name = $fields[0];
my $address = $fields[3];
my $zip = $fields[7];

2. You can use an array slice on the array:

my( $name, $address, $zip ) = @fields[0,3,7];

3. You can use an array slice on the return list from split:

my( $name, $address, $zip ) = (split("\t",$line))[0,3,7];

4. You can split the line into individual variables:

my( $name, $position, $salary, $address, $street, $city, $country, $zip ) = 
split("\t",$line);

5. You can use undefs to ignore columns you don't want:

my( $name, undef, undef, $address, undef, undef, undef, $zip ) = 
split("\t",$line);



--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: Extracting Columns from tab delimited files

Reply via email to