Re: Extracting Columns from tab delimited files

Tiago Hori Mon, 11 Feb 2013 07:56:49 -0800

Thanks, Jim.

This awesome!


T.

Sent from my iPhone

On 2013-02-11, at 11:49 AM, Jim Gibson <jimsgib...@gmail.com> wrote:

> 
> On Feb 10, 2013, at 5:57 PM, Tiago Hori wrote:
> 
>> Hi All,
>> 
>> I am trying to force myself to not use one of perl's modules to parse tab
>> delimited files (like TXT::CSV), so please be patient and don't tell me
>> just to go and use them. I am trying to re-ivent the wheel, so to speak,
>> because as we do with science, we repeat experiments to lean about the
>> process even tough we know the outcome.
> 
> Coding your own solutions rather than using a module already built for the 
> same purpose is perfectly all right, especially if you are learning Perl. If 
> you are confident your data has a simple format and will not change, then you 
> can parse it yourself. Keep in mind, however, that the Text::CSV module can 
> handle more complicated cases. For example, what if your data fields can 
> contain the separator character? In that case, your data fields may be 
> enclosed in quotes or the embedded separator characters will have to be 
> escaped (e.g., preceded by a '\' character or some other means.) The 
> Text::CSV module can handle these cases, plus it can read from a file or a 
> scalar and deal with broken lines and other complexities. There is also the 
> Text::CSV::XS module which includes C code for speed.
> 
>> 
>> So I started by putting reading in the files and go one line at time,
>> putting those line in arrays and matching a specific line of interest. With
>> join I could then turn the array of interest in a scalar and print that
>> out. That is almost what I wanted (see code below):
>> 
>> #! /usr/bin/perl
>> use strict;
>> use warnings;
>> 
>> my $filename_data = $ARGV[0];
>> my $filename_target = $ARGV[1];
>> my $line_number = 1;
>> my @targets;
>> 
>> open FILE, "<", $filename_data or die $!;
>> open TARGET, "<", $filename_target or die $!;
> 
> Lexical file handles are generally better, and it helps to include the file 
> name in the error message:
> 
> open(my $file, '<', $filename_data) or 
>  die( "Can't open $filename_data for reading: $!");
> 
>> 
>> while (<TARGET>){
>>   push (@targets, $_);
>> }
> 
> You can replace the above with:
> 
> my @targets = <TARGET>;
> 
> You can also do this to remove the line ending characters from @targets:
> 
> chomp(@targets);
> 
>> 
>> close (TARGET);
>> 
>> while (<FILE>){
>>   chomp;
>>   my $line = $_;
> 
> You can read directly into a scalar, so no need for the $_ variable here:
> 
> while( my $line = <FILE> ) {
>  chomp($line);
> 
>>   my @elements = split ("\t", $line);
>>   my $row_name = $elements[0];
>>   if ($line_number == 1){
>> my $header = join("\t", @elements);
> 
> You are splitting $line, then joining it back up in $header. Why not just
> $header = $line;
> 
>> print $header, "\n";
>> $line_number = 2;}
>>   elsif($line_number = 2){
> 
> That should be 
> 
>    elsif( $line_number == 2 ) {
> 
>>         foreach (@targets){
>>     chomp;
>>             my $target = $_;
>>             if ($row_name eq $target){
>> my $data = join("\t", @elements);
>>         print $data,"\n";
> 
> Once again, just use $line.
> 
>>     }
>> }
>>   }
>> }
>> 
>> close (FILE);
>> 
>> Realistic, I don't want the whole row. So I started thinking about how to
>> get specific columns. I started reading on the internet and the ideas seems
>> to be placing the arrays containing the lines in a hash indexed by the row
>> names. So I did this:
> 
> There are several ways to extract individual columns from a CSV line.
> 
> 1. You can split the line into an array and make copies of specific elements:
> 
> my @fields = split("\t",$line);
> my $name = $fields[0];
> my $address = $fields[3];
> my $zip = $fields[7];
> 
> 2. You can use an array slice on the array:
> 
> my( $name, $address, $zip ) = @fields[0,3,7];
> 
> 3. You can use an array slice on the return list from split:
> 
> my( $name, $address, $zip ) = (split("\t",$line))[0,3,7];
> 
> 4. You can split the line into individual variables:
> 
> my( $name, $position, $salary, $address, $street, $city, $country, $zip ) = 
> split("\t",$line);
> 
> 5. You can use undefs to ignore columns you don't want:
> 
> my( $name, undef, undef, $address, undef, undef, undef, $zip ) = 
> split("\t",$line);
> 
> 
> 
> -- 
> To unsubscribe, e-mail: beginners-unsubscr...@perl.org
> For additional commands, e-mail: beginners-h...@perl.org
> http://learn.perl.org/
> 
> 

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: Extracting Columns from tab delimited files

Reply via email to