Yet another way to do it*:

while (<DATA>) {

    chomp;

    # extract fields

    my @fields = split /"?,"|"?,(?=\d)/;

    # print the fields separated with *

    print join '*', @fields;

    print "\n";

}

*Presumimg that fields which are *not* enclosed in quotes will begin with a digit.

*Note that fields can contain nested quotes as long as there isn't a comma next to them.

Hans Meier (John Doe) wrote:
Wagner, David --- Senior Programmer Analyst --- WGO am Dienstag, 21. Februar 
2006 23.55:

        here is a small snippet of code(LABEL1) which appears to remove a comma
which lies between two double quotes. I run it and and display output and
the one line of code which does have the comma is cleaned up. In LABEL2 ,
is a snippet of code which does not work, but in all appearances is the
same as my small snippet of code. The working code is AS 5.8.3 on Windows
XP while the the failing is on Sun and is also 5.8.3.

        I am receiving some data and and need to clean up and also split. I 
prefer
to not have to load any type of csv handler and works for the most part.

        I don't see the difference in the code other than two different systems.
        Note: Moved this same code over ( didn't occur to try it, but head must 
be
stuck). It runs and removes the , from within the double quotes.

        Has to be something simple that I am missing. Though been doing Perl for
quite a while, never really been good at the regex processing.

        Thanks.

Wags ;)
===========================================================================
==============================================

LABEL1:
#!perl
use strict;
use warnings;

my $MyIn  = 0;
my $MyOut = 0;

my $MyHldData;
my $MyWrkFld;
my $MyWrkFldUpd;

while ( <DATA> ) {
   chomp;
   s/\r//g;
   next if ( /^\s*$/ );
   my $MyHldData = $_;

   if ( /"/ ) {
       printf "*1a* Looking at line with quotes\n";
       while ( /("[^"]+")/ ) {
           $MyWrkFld = $1;
           printf "*1*  <%s>",
                               $1;

           $MyWrkFldUpd = $MyWrkFld;

           if ( $MyWrkFld =~ /,/ ) {
               printf "<--Comma hit!!";
               $MyWrkFldUpd =~ s/[,"]//g;
               s/$MyWrkFld/$MyWrkFldUpd/g;
            }
            else {
               $MyWrkFldUpd =~ s/"//g;
               s/$MyWrkFld/$MyWrkFldUpd/g;
            }
           printf "\n";
        }
    }
    else {
       printf "No quotes in line %d\n",
                               $.;
       next;
    }
   printf "ln:<%5d>\nor:<%s>\nmd:<%s>\n",
                               $.,
                               $MyHldData,
                               $_
}
__DATA__
2006-02-18 12:00EE,"TBUTHGHN - GGTT","TEN DIGGB","TEN DIGGB","FHEZGG PEINT
TD","7077 CBNTBLIDETGD GEY",2006-02-14
12:00EE,15,"10:05","0152785","2737526",1,1250,10,"892913494",1,25
2006-02-18 12:00EE,"TBUTHGHN - GGTT","TEN DIGGB","TEN DIGGB","EETGH
EGCHENICEL","7840 BELBBE EVG",2006-02-15
12:00EE,16,"11:27","0107405","2846954",1,1167,3,"916708540",1,25 2006-02-18
12:00EE,"TBUTHGHN - GGTT","TEN DIGGB","TEN DIGGB","EETGH EGCHENICEL","7840
BELBBE EVG",2006-02-15
12:00EE,17,"13:47","0107405","2846954",1,456,1,"916708557",1,25 2006-02-18
12:00EE,"TBUTHGHN - GGTT","TEN DIGGB","TEN DIGGB","NEVEL DGPBT LGVGL HGPEIH
EGGNT","N46433 FLGGT TT, BLDG 661-3",2006-02-16
12:00EE,18,"11:40","0164109","2500058",1,529,1,"1078754644",1,25

==================================================================

LABEL2:
        ....

   INPUTTP: while (<MYFILEIN>) {
       chomp;

       $in++;
       s/\r//g;

       next if ( /^\s*$/ );    # bypass blank lines

       if ( ! /,(\d+)$/ ) {
           printf "Expecting a csv line ending with the total number of
times associated with\n"; printf "a terminal, but did not get a hit!\n";
           printf "Data(%d):\<%-s>\n",
                               $.,
                               $_;
           diet(5, $MyFileIn);
        }

       $MyDtlCnt = $1;
       undef @MyWorka;
       undef @MyUnSortedData;

       if ( /"/ ) {
           printf "*1a* Looking at line with quotes\n";
           while ( /("[^"]+")/ ) {
               $MyWrkFld = $1;
               $MyWrkFldUpd = $MyWrkFld;
               if ( $MyWrkFld =~ /,/ ) {
                   $MyWrkFldUpd =~ s/[,"]//g;
                   s/$MyWrkFld/$MyWrkFldUpd/g;
                }
                else {
                   $MyWrkFldUpd =~ s/"//g;
                   s/$MyWrkFld/$MyWrkFldUpd/g;
                }
            }
        }

        .....


For fun I played around a bit with regexes, but I think the usage of a csv 
module is easier :-)

while (<DATA>) {
        chomp;

        # extract fields
        #
        my 
@fields=$_=~/((?:".*?")|(?:(?<=,).*?(?=,))|(?:(?<=,).*?$)|(?:^.*?(?=,)))/g;

        # remove quotes
        #
        $_=~s/"(.*?)"/$1/ for @fields;


        # print the fields separated with *
        #
        print join '*', @fields; print "\n";
}

__DATA__
2006-02-18 12:00EE,"TBUTHGHN - GGTT","TEN DIGGB","TEN DIGGB","FHEZGG PEINT TD","7077 CBNTBLIDETGD GEY",2006-02-14 
12:00EE,15,"10:05","0152785","2737526",1,1250,10,"892913494",1,25
2006-02-18 12:00EE,"TBUTHGHN - GGTT","TEN DIGGB","TEN DIGGB","EETGH EGCHENICEL","7840 BELBBE EVG",2006-02-15 
12:00EE,16,"11:27","0107405","2846954",1,1167,3,"916708540",1,25
2006-02-18 12:00EE,"TBUTHGHN - GGTT","TEN DIGGB","TEN DIGGB","EETGH EGCHENICEL","7840 BELBBE EVG",2006-02-15 
12:00EE,17,"13:47","0107405","2846954",1,456,1,"916708557",1,25
2006-02-18 12:00EE,"TBUTHGHN - GGTT","TEN DIGGB","TEN DIGGB","NEVEL DGPBT LGVGL HGPEIH EGGNT","N46433 FLGGT TT, BLDG 
661-3",2006-02-16 12:00EE,18,"11:40","0164109","2500058",1,529,1,"1078754644",1,25





--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to