Thanks  Nobumi,

Your solution is not only shorter but also more precise and correct than my first attempt. But, anyway, although it works better it doesn't find words with different accented capitalization. That is, if you look for "Ángeles" it doesn't find nor "Angeles" nor "angeles" nor "ángeles"...

Any idea how to address this issue?

Might I convert the lines to ascii and look each line in ascii although writting them in the original encoding?



Thanks!


El 19/04/2006, a las 7:46, Nobumi Iyanaga escribió:

Dear Ende,

I think you have already received replies to your query.

I don't know if my 2 cents are helpful for you, but I will try. I don't understand exactly what you are trying to do, but...:

On Apr 19, 2006, at 3:14 AM, ende wrote:

In Perl

#!/usr/bin/env perl
#
# telefonos.pl
#
# me 2006-04-07
#
#

binmode STDOUT, ":utf8";
use encoding 'utf8';

my $listin = "/Users/me/Documents/documentos/Familia/Casa/ Telistin.txt";
my $alphax = "/Applications/Alpha/AlphaX.app";

if ([EMAIL PROTECTED]) {
    exec "open -a $alphax $listin";
}

if (! -e $listin){
    print "strange! file not found: $listing \n";
    exit 1;
}
open my $f, "<:encoding(MacRoman)", "$listin" or die "$listin no abre: $!";
    my @todo = <$f>;
    close $f;


    # PROBLEM
    my @args = map {utf8::decode($_)} @ARGV;
    my $re = join("|", @ARGV);



    print grep(/$re/i, @todo), "\n";

    # also look in the Apple AddressBook
    foreach my $a (@ARGV) {
        system("abtool $a");
    }


I would do something like the following:

#!/usr/bin/perl

use utf8;
use Encode;

binmode (STDOUT, ":utf8");

my $re = join("|", @ARGV);
$re = decode ("utf8", $re);
my $listin = "/Users/me/Documents/documentos/Familia/Casa/ Telistin.txt";

open my $f, "<:encoding(MacRoman)", "$listin" or die "$listin no abre: $!";
while (<$f>) {
        chomp;
        print $_, "\n" if /$re/i;
}
close $f;
....

You can save this script as "Ende_test.pl", and call it in the following way:

perl Ende_test.pl Ángeles angeles

If your "Telistin.txt" has lines containing:
Ángeles
Angeles
ángeles
angeles

the above command line should print all these lines. The trick would have been the conversion of @ARGV to utf8, using

use Encode;
$re = decode ("utf8", $re);

And you must have:

use utf8;

at the beginning to do a regex search with utf8 string.

I hope this is of some help for you.

All the best,

Nobumi

-------

Nobumi Iyanaga
Tokyo,
Japan



---- ende



Reply via email to