Thanks Nobumi,
Your solution is not only shorter but also more precise and correct
than my first attempt. But, anyway, although it works better it
doesn't find words with different accented capitalization. That is,
if you look for "Ángeles" it doesn't find nor "Angeles" nor "angeles"
nor "ángeles"...
Any idea how to address this issue?
Might I convert the lines to ascii and look each line in ascii
although writting them in the original encoding?
Thanks!
El 19/04/2006, a las 7:46, Nobumi Iyanaga escribió:
Dear Ende,
I think you have already received replies to your query.
I don't know if my 2 cents are helpful for you, but I will try. I
don't understand exactly what you are trying to do, but...:
On Apr 19, 2006, at 3:14 AM, ende wrote:
In Perl
#!/usr/bin/env perl
#
# telefonos.pl
#
# me 2006-04-07
#
#
binmode STDOUT, ":utf8";
use encoding 'utf8';
my $listin = "/Users/me/Documents/documentos/Familia/Casa/
Telistin.txt";
my $alphax = "/Applications/Alpha/AlphaX.app";
if ([EMAIL PROTECTED]) {
exec "open -a $alphax $listin";
}
if (! -e $listin){
print "strange! file not found: $listing \n";
exit 1;
}
open my $f, "<:encoding(MacRoman)", "$listin" or die "$listin
no abre: $!";
my @todo = <$f>;
close $f;
# PROBLEM
my @args = map {utf8::decode($_)} @ARGV;
my $re = join("|", @ARGV);
print grep(/$re/i, @todo), "\n";
# also look in the Apple AddressBook
foreach my $a (@ARGV) {
system("abtool $a");
}
I would do something like the following:
#!/usr/bin/perl
use utf8;
use Encode;
binmode (STDOUT, ":utf8");
my $re = join("|", @ARGV);
$re = decode ("utf8", $re);
my $listin = "/Users/me/Documents/documentos/Familia/Casa/
Telistin.txt";
open my $f, "<:encoding(MacRoman)", "$listin" or die "$listin no
abre: $!";
while (<$f>) {
chomp;
print $_, "\n" if /$re/i;
}
close $f;
....
You can save this script as "Ende_test.pl", and call it in the
following way:
perl Ende_test.pl Ángeles angeles
If your "Telistin.txt" has lines containing:
Ángeles
Angeles
ángeles
angeles
the above command line should print all these lines. The trick
would have been the conversion of @ARGV to utf8, using
use Encode;
$re = decode ("utf8", $re);
And you must have:
use utf8;
at the beginning to do a regex search with utf8 string.
I hope this is of some help for you.
All the best,
Nobumi
-------
Nobumi Iyanaga
Tokyo,
Japan
---- ende