On Dec 16, 2008, at 12:20 , <[email protected]> <[email protected]
> wrote:
On Tue, 16 Dec 2008 11:51:28 +0100, Robin Berjon <[email protected]>
wrote:
Before putting that into a module though you might want to think
about
what should happen to characters outside the [a-z0-9] range as \W
will
match differently based on locale. I'm not sure what the recommended
behaviour is for such cases.
That's what I'm thinking about right now. I couldn't find a
reference which
says that \W matches differently based on locale.
From perlre
A "\w" matches a single alphanumeric character (an alphabetic
character, or a decimal digit) or "_", not a whole word. Use
"\w+" to
match a string of Perl-identifier characters (which isn't the
same as
matching an English word). If "use locale" is in effect, the
list of
alphabetic characters generated by "\w" is taken from the
current
locale.
Ptyhon can convert an utf8 string to an ascii string and replaces
characters like "ä" with the most equivalent character "a". Is there
such
a thing for perl?
There's a host of modules on CPAN that do things like that, but I
don't know if one is accepted as the better way to go. The problem is
that if you want to cover all your bases it can become a rather
extensive problem. For instance you might want to convert "é" to "e",
but do you want to map "北京" to "beijing"?
The simple solution is probably to have one option that encodes to IRI
friendly, and another to URI friendly, and let people who want
something more complicated roll up their own. See http://annevankesteren.nl/2004/08/uri-design
for some thoughts related to this, or http://www.w3.org/International/iri-edit/draft-duerst-iri-bis.html
.
But that doesn't address the locale issue. For that be sure to toss in
a no locale (which is lexical) or to define your own character classes
instead of \w, \s, and friends.
--
Robin Berjon - http://berjon.com/
Feel like hiring me? Go to http://robineko.com/
_______________________________________________
List: [email protected]
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/[email protected]/
Dev site: http://dev.catalyst.perl.org/