On Thu, Jul 23, 2009 at 18:43, Jenny Chen<qyjc...@yahoo.com> wrote:
> Hi All,
>
> I need some help with utf-8 string handling in Perl. I tried to trim utf-8 
> strings using Perl. Follow is the main portion of the codes, but it does not 
> work. Any help will be greatly appreciated.
>
> Jenny
> -----
>
> open (DIC_OLD, "<:utf8", $tmp_file) || die "can not open $tmp_file: ! \n";
> open (DIC_NEW, ">:utf8", $dictionary_file) || die "can not open 
> $dictionary_file: ! \n";
> $max_len = 290;  # the max # of characters can be displayed
> while ($myline = <DIC_OLD>) {
>  chomp $myline;
>
>  #format: <phrase i="212" t="DNS Server 1"/>
>  if ($myline =~ /<phrase\s*i=/) { #skip headers
>     �...@col = split(/\"/, $myline);
>
>            if ( length($col[3]) > $max_len ) {
>      $tmp = $col[3];
>         $col[3] = substr($tmp, 0, ($max_len - 1)); # Trim the tail-end of the 
> string leaving
>         $myline = join("\"", @col);
>  }
>  print DIC_NEW "$myline\n";
> }

Without data and a description of what "does not work" means we cannot
really help you, but I spot at least 4 problems with your code:

1. You aren't use the [strict][1] and [warnings][2] pragmas.
2. You are using old style filehandles instead new style lexical filehandles
3. You have ! instead of $! in your error messages
4. It looks like you are trying to edit XML line by line using a
regex, this is a bad idea.  Use a parser like [XML::Twig][3] instead.

I would write your code like this:

#!/usr/bin/perl

use strict;
use warnings;

use File::Copy;
use File::Temp qw/tempfile/;

my $dictionary_file = shift;
my ($fh, $tmp_file) = tempfile;

#the $fh filehandle is needed to ensure that no one else
#gets this temporary file name, but once we are certain we
#have the name, we need to discard the filehandle because
#we want to copy the dictionary into it and open it as a
#read filehandle.
close $fh;

copy $dictionary_file, $tmp_file
        or die "could not backup $dictionary_file: $!\n";

open my $old, "<:utf8", $tmp_file
        or die "can not open $tmp_file: $!\n";

open my $new, ">:utf8", $dictionary_file
        or die "can not open $dictionary_file: $!\n";

my $max_len = 290;  # the max # of characters can be displayed
while (my $myline = <$old>) {
        chomp $myline;

        #format: <phrase i="212" t="DNS Server 1"/>
        if ($myline =~ /<phrase\s*i=/) { #skip headers
                my @col = split /"/, $myline;

                if ( length $col[3] > $max_len ) {
                        # Trim the tail-end of the string leaving (what?)
                        $col[3] = substr $col[3], 0, $max_len - 1;
                        $myline = join '"', @col;
                }
                print $new "$myline\n";
        }
}

[1] : http://perldoc.perl.org/strict.html
[2] : http://perldoc.perl.org/warnings.html
[3] : http://search.cpan.org/dist/XML-Twig/Twig.pm

-- 
Chas. Owens
wonkden.net
The most important skill a programmer can have is the ability to read.

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to