On Feb 10, 2013, at 5:57 PM, Tiago Hori wrote: > Hi All, > > I am trying to force myself to not use one of perl's modules to parse tab > delimited files (like TXT::CSV), so please be patient and don't tell me > just to go and use them. I am trying to re-ivent the wheel, so to speak, > because as we do with science, we repeat experiments to lean about the > process even tough we know the outcome.
Coding your own solutions rather than using a module already built for the same purpose is perfectly all right, especially if you are learning Perl. If you are confident your data has a simple format and will not change, then you can parse it yourself. Keep in mind, however, that the Text::CSV module can handle more complicated cases. For example, what if your data fields can contain the separator character? In that case, your data fields may be enclosed in quotes or the embedded separator characters will have to be escaped (e.g., preceded by a '\' character or some other means.) The Text::CSV module can handle these cases, plus it can read from a file or a scalar and deal with broken lines and other complexities. There is also the Text::CSV::XS module which includes C code for speed. > > So I started by putting reading in the files and go one line at time, > putting those line in arrays and matching a specific line of interest. With > join I could then turn the array of interest in a scalar and print that > out. That is almost what I wanted (see code below): > > #! /usr/bin/perl > use strict; > use warnings; > > my $filename_data = $ARGV[0]; > my $filename_target = $ARGV[1]; > my $line_number = 1; > my @targets; > > open FILE, "<", $filename_data or die $!; > open TARGET, "<", $filename_target or die $!; Lexical file handles are generally better, and it helps to include the file name in the error message: open(my $file, '<', $filename_data) or die( "Can't open $filename_data for reading: $!"); > > while (<TARGET>){ > push (@targets, $_); > } You can replace the above with: my @targets = <TARGET>; You can also do this to remove the line ending characters from @targets: chomp(@targets); > > close (TARGET); > > while (<FILE>){ > chomp; > my $line = $_; You can read directly into a scalar, so no need for the $_ variable here: while( my $line = <FILE> ) { chomp($line); > my @elements = split ("\t", $line); > my $row_name = $elements[0]; > if ($line_number == 1){ > my $header = join("\t", @elements); You are splitting $line, then joining it back up in $header. Why not just $header = $line; > print $header, "\n"; > $line_number = 2;} > elsif($line_number = 2){ That should be elsif( $line_number == 2 ) { > foreach (@targets){ > chomp; > my $target = $_; > if ($row_name eq $target){ > my $data = join("\t", @elements); > print $data,"\n"; Once again, just use $line. > } > } > } > } > > close (FILE); > > Realistic, I don't want the whole row. So I started thinking about how to > get specific columns. I started reading on the internet and the ideas seems > to be placing the arrays containing the lines in a hash indexed by the row > names. So I did this: There are several ways to extract individual columns from a CSV line. 1. You can split the line into an array and make copies of specific elements: my @fields = split("\t",$line); my $name = $fields[0]; my $address = $fields[3]; my $zip = $fields[7]; 2. You can use an array slice on the array: my( $name, $address, $zip ) = @fields[0,3,7]; 3. You can use an array slice on the return list from split: my( $name, $address, $zip ) = (split("\t",$line))[0,3,7]; 4. You can split the line into individual variables: my( $name, $position, $salary, $address, $street, $city, $country, $zip ) = split("\t",$line); 5. You can use undefs to ignore columns you don't want: my( $name, undef, undef, $address, undef, undef, undef, $zip ) = split("\t",$line); -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/