On Thu, Jul 23, 2009 at 18:43, Jenny Chen<qyjc...@yahoo.com> wrote: > Hi All, > > I need some help with utf-8 string handling in Perl. I tried to trim utf-8 > strings using Perl. Follow is the main portion of the codes, but it does not > work. Any help will be greatly appreciated. > > Jenny > ----- > > open (DIC_OLD, "<:utf8", $tmp_file) || die "can not open $tmp_file: ! \n"; > open (DIC_NEW, ">:utf8", $dictionary_file) || die "can not open > $dictionary_file: ! \n"; > $max_len = 290; # the max # of characters can be displayed > while ($myline = <DIC_OLD>) { > chomp $myline; > > #format: <phrase i="212" t="DNS Server 1"/> > if ($myline =~ /<phrase\s*i=/) { #skip headers > �...@col = split(/\"/, $myline); > > if ( length($col[3]) > $max_len ) { > $tmp = $col[3]; > $col[3] = substr($tmp, 0, ($max_len - 1)); # Trim the tail-end of the > string leaving > $myline = join("\"", @col); > } > print DIC_NEW "$myline\n"; > }
Without data and a description of what "does not work" means we cannot really help you, but I spot at least 4 problems with your code: 1. You aren't use the [strict][1] and [warnings][2] pragmas. 2. You are using old style filehandles instead new style lexical filehandles 3. You have ! instead of $! in your error messages 4. It looks like you are trying to edit XML line by line using a regex, this is a bad idea. Use a parser like [XML::Twig][3] instead. I would write your code like this: #!/usr/bin/perl use strict; use warnings; use File::Copy; use File::Temp qw/tempfile/; my $dictionary_file = shift; my ($fh, $tmp_file) = tempfile; #the $fh filehandle is needed to ensure that no one else #gets this temporary file name, but once we are certain we #have the name, we need to discard the filehandle because #we want to copy the dictionary into it and open it as a #read filehandle. close $fh; copy $dictionary_file, $tmp_file or die "could not backup $dictionary_file: $!\n"; open my $old, "<:utf8", $tmp_file or die "can not open $tmp_file: $!\n"; open my $new, ">:utf8", $dictionary_file or die "can not open $dictionary_file: $!\n"; my $max_len = 290; # the max # of characters can be displayed while (my $myline = <$old>) { chomp $myline; #format: <phrase i="212" t="DNS Server 1"/> if ($myline =~ /<phrase\s*i=/) { #skip headers my @col = split /"/, $myline; if ( length $col[3] > $max_len ) { # Trim the tail-end of the string leaving (what?) $col[3] = substr $col[3], 0, $max_len - 1; $myline = join '"', @col; } print $new "$myline\n"; } } [1] : http://perldoc.perl.org/strict.html [2] : http://perldoc.perl.org/warnings.html [3] : http://search.cpan.org/dist/XML-Twig/Twig.pm -- Chas. Owens wonkden.net The most important skill a programmer can have is the ability to read. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/