On 9/17/07, Jonathan Lang <[EMAIL PROTECTED]> wrote: snip > Most of the replies have suggested using 'split( /\|/, $line )'. > However, this ignores a potentially important aspect of common cvs > file formats - well, important to me, anyway - which is the > interaction between quotes, field delimiters, and newlines: snip
This is because most of the time you see pipe delimited files they aren't really full blown CSV type files; they are usually just fields delimited by pipes (ie neither pipes nor end-of-line characters are not allowed in fields). If you have some psuedo-CSV like pipe delimited file you can (after beating the person upstream who decided to use a custom file format instead of XML or CSV) do something like the following. #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my @recs; #holds completed records my @fields; #holds the record being built my @leftovers; #holds unprocessed field pieces while (<DATA>) { #the data to process are the the leftover pieces #of the last line and the current line split on #pipe (but keep the pipes, they might be part of #a string) my @data = (@leftovers, split /(\|)/); #while there are still pieces to process while (@data) { #if the current piece does not #start with a quote we can treat #it normally unless ($data[0] =~ /^\s*"/) { #remove this field from the #unprocess pieces my $field = shift @data; #skip it if it is a pipe next if $field =~ /^\|$/; #remove the \n if this is #the last piece chomp $field if @data == 0; #shove the field onto the #record being built push @fields, $field; #and start again with the next piece next; } #Fields that start with a quote require special #handling. These fields are not complete until #they have an even number of quotes my $i = 0; my $quotes = 0; while ($i <= $#data and ($quotes == 0 or $quotes % 2)) { $quotes += $data[$i++] =~ y/"//; } #if the number of quotes are not even #then all of these pieces go at the start #of the next line last if $quotes % 2; #if the number of quotes are even then #join all of the pieces that make it even #and remove them from the unproccessed #pieces my $field = join '', splice @data, 0, $i; #remove the outer quotes $field =~ s/\s*"(.*)"\s*/$1/s; #turn the quoted quotes into normal quotes $field =~ s/""/"/gs; #add this field to the record that is #being built push @fields, $field; #and start again with the next piece next; } unless (@leftovers = @data) { #if there are no leftovers then #the record is finished push @recs, [EMAIL PROTECTED]; @fields = (); } } print Dumper [EMAIL PROTECTED]; __DATA__ "Harry|Sally"|Sleepless Jack|"Jill ""Walker""" -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/