On Sat, Apr 11, 2009 at 14:11, Kelly Jones <kelly.terry.jo...@gmail.com> wrote: > I'm trying to convert UTF-8 to ASCII in Perl. Is there an easy way to > do this? > > I tried Unicode::UTF8simple, but ended up w/ many ctrl-a's, which > can't be right. > > I'm going for an extremely complete transliteration, so I want ETH > (for example) to be converted to both "d" and "dh". In other words, my > input is ONE string, but my return value is a LIST of strings. > > My goal: create an ASCII version of geonames' alternateNames table.
Hmm, I don't know of any functions off the top of my head that do that sort of thing. You might try searching CPAN[1]. If you don't find anything you like, I would start by building a table like my %utf8_to_ascii = ( "\N{LATIN SMALL LETTER ETH}" => [ qw/ d dh / ], ); Note, to use "\N{LATIN SMALL LETTER ETH}" instead of "\x{F0}" you will need to use the charnames[2] pragma. You could then break the string into individual characters and create a list of all possible outcomes: #!/usr/bin/perl use strict; use warnings; my %map = ( a => [ qw/ aa ab / ], e => ['y'], ); for my $word (qw/ bad bed base /) { print "$word =>\n", map "\t$_\n", expand(romanize($word, \%map)); } #produce a compact representation of the possible strings sub romanize { my ($word, $map) = @_; my @string; for my $char (split //, $word) { my @chars = $map->{$char} ? @{$map->{$char}} : ($char); push @string, \...@chars; } return @string; } #expand the compact representation into all possible strings sub expand { my @string = @_; my @result; return @{$string[0]} if @string == 1; for my $char (@{$string[0]}) { for my $string (expand(@string[1 .. $#string])) { push @result, join '', "$char$string"; } } return @result; } 1. http://search.cpan.org/ 2. http://perldoc.perl.org/charnames.html -- Chas. Owens wonkden.net The most important skill a programmer can have is the ability to read. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/