Re: Enconding, locate, etc.

ende Wed, 19 Apr 2006 01:22:50 -0700


Thanks  Nobumi,

Your solution is not only shorter but also more precise and correctthan my first attempt. But, anyway, although it works better itdoesn't find words with different accented capitalization. That is,if you look for "Ángeles" it doesn't find nor "Angeles" nor "angeles"nor "ángeles"...


Any idea how to address this issue?

Might I convert the lines to ascii and look each line in asciialthough writting them in the original encoding?




Thanks!


El 19/04/2006, a las 7:46, Nobumi Iyanaga escribió:

Dear Ende,

I think you have already received replies to your query.
I don't know if my 2 cents are helpful for you, but I will try. Idon't understand exactly what you are trying to do, but...:
On Apr 19, 2006, at 3:14 AM, ende wrote:
In Perl

#!/usr/bin/env perl
#
# telefonos.pl
#
# me 2006-04-07
#
#

binmode STDOUT, ":utf8";
use encoding 'utf8';
my $listin = "/Users/me/Documents/documentos/Familia/Casa/Telistin.txt";
my $alphax = "/Applications/Alpha/AlphaX.app";

if ([EMAIL PROTECTED]) {
    exec "open -a $alphax $listin";
}

if (! -e $listin){
    print "strange! file not found: $listing \n";
    exit 1;
}
open my $f, "<:encoding(MacRoman)", "$listin" or die "$listinno abre: $!";
    my @todo = <$f>;
    close $f;


    # PROBLEM
    my @args = map {utf8::decode($_)} @ARGV;
    my $re = join("|", @ARGV);



    print grep(/$re/i, @todo), "\n";

    # also look in the Apple AddressBook
    foreach my $a (@ARGV) {
        system("abtool $a");
    }
I would do something like the following:

#!/usr/bin/perl

use utf8;
use Encode;

binmode (STDOUT, ":utf8");

my $re = join("|", @ARGV);
$re = decode ("utf8", $re);
my $listin = "/Users/me/Documents/documentos/Familia/Casa/Telistin.txt";
open my $f, "<:encoding(MacRoman)", "$listin" or die "$listin noabre: $!";
while (<$f>) {
        chomp;
        print $_, "\n" if /$re/i;
}
close $f;
....
You can save this script as "Ende_test.pl", and call it in thefollowing way:
perl Ende_test.pl Ángeles angeles

If your "Telistin.txt" has lines containing:
Ángeles
Angeles
ángeles
angeles
the above command line should print all these lines. The trickwould have been the conversion of @ARGV to utf8, using
use Encode;
$re = decode ("utf8", $re);

And you must have:

use utf8;

at the beginning to do a regex search with utf8 string.

I hope this is of some help for you.

All the best,

Nobumi

-------

Nobumi Iyanaga
Tokyo,
Japan



---- ende

Re: Enconding, locate, etc.

Reply via email to