Re: Re: Matching/replacing

Rob Dixon Wed, 12 Nov 2003 10:06:15 -0800

Hi Robert.

Robert wrote:
>
> The problem even with doing redundent things is that the dedundency's didn't
> clean up the extra white spaces in each line.
>
> I glob in the whole file, bad, I know, but it's what I know and what works,
> it also gives no overhead on the server and the script takes less then 30
> seconds to download, parse the file 5 different ways, do some character
> mappings, and other fun stuff.


Your file system should cache the file contents so that you won't do more
fetches than are optimum. But OK, but if you're slurping the whole file
then you need code like this:

  my $data;
  {
    local $/;
    open OLDFILE, "< $file" or die $!;
    $data = <OLDFILE>;
    close OLDFILE;
  }

and no 'while' loop. Slurping the whole file can be fine, but
in this case you're mixing two idioms: the record-at-a-time
'while' loop and the entire body in one string. You're additionally
confusing people by using $line for the entire file's contents rather
than a single record or 'line'.

> Here's an original line:
>          AA-1251     |12"X500 ALUMINUM FOIL (HVY) EA|     24.76 |     16.19
> |     15.79 |     15.40 |U5|ALCAN |     8.000 |     5.000 |     9.000 |B

Have you replaced all of the TAB characters with a pipe at this point?

> Here's the line before the cleaning:
> 5|         AA-1251     |12"X500 ALUMINUM FOIL (HVY) EA|     24.76 |
> 16.19 |     15.79 |     15.40 |245|3|     8.000 |     5.000 |     9.000 |B
> |         AA-1251
>
> Here's the same line after the script:
> 5| AA-1251 |12in. X500 ALUMINUM FOIL (HVY) EA| 24.76 | 16.19 | 15.79 | 15.40
> |245|3| 8.000 | 5.000 | 9.000 |B | AA-1251
>
> Here's the cleanup script as it stands right now.  The problem with the file
> that comes out is leading white space after each |
>
> sub cleanup{
>
> use strict;
>
> my $file = "/home/web/sales/info/bad.sql";
> my $newfile = "/home/web/sales/info/inventory.sql";
> my $line;
>
>         open (OLDFILE, "< $file");
>         open (NEWFILE, "> $newfile");
>         while ($line = <OLDFILE>)  {
> #               $line = $line =~ /^\s*(.*)\s*\n$/;
> $line =~ s/^ //mg;
> $line =~ s/ $//mg;
> $line =~ s/\t/|/mg;
> $line =~ s/\s+/ /mg;
> $line =~ s/^\s*//mg;
> $line =~ s/\s*$//mg;
> $line =~ s/\s*$//mg;
> ###  The following lines mod the files to reflect inches and feet
> ### $line =~ s/"/in./mg;
> ### $line =~ s/'/ft./mg;
> $line =~ s/(?<=\d)"/in. /mg;
> $line =~ s/(?<=\d)'/ft. /mg;
>
>                 print NEWFILE "$line\n";
>         }
>         close OLDFILE;
>         close NEWFILE;
>
>   print "$newfile has now been created\n";
> }

I'd really recommend doing this one record at a time, but this
mod should come close to working. (It compiles, but it's
untested.) You'll be nagged endlessly by people on this group
unless you slurp files only when you need to!

The main change is to remove leading and trailing whitespace
on each field with

  $data =~ s/\s*\t\s*/|/g

Come back if you need any more explanation

Cheers,

Rob


  use strict;

  sub cleanup{

    my $file = '/home/web/sales/info/bad.sql';
    my $newfile = '/home/web/sales/info/inventory.sql';

    my $data;
    {
      local $/;
      open OLDFILE, "< $file" or die $!;
      $data = <OLDFILE>;
      close OLDFILE;
    }

    # Change TAB characters embedded in whitespace to pipes
    #
    $data =~ s/\s*\t\s*/|/g;

    $data =~ s/^\s+//mg;   # remove all leading space
    $data =~ s/\s+$//mg;   # and trailing space

    # The following lines mod the files to reflect inches and feet
    #
    $data =~ s/(?<=\d)"/in./g;
    $data =~ s/(?<=\d)'/ft./g;

    open NEWFILE, ">$newfile" or die $!;
    print NEWFILE "$_\n";
    close NEWFILE;

    print "$newfile has now been created\n";
  }




-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Re: Matching/replacing

Reply via email to