On Mon, 2006-09-04 at 22:19 -0400, Mark Leisher wrote: > Though I haven't checked myself, I wouldn't be surprised if Perl, > Python, PHP, and a host of other programming languages weren't already > doing this, making your concerns pointless. You would probably find it > instructive to look at some lexical scanners.
To add a sidenote to this otherwise pointless conversation, the ECMA Script (aka Javascript) standard actually ignores all format characters (gen-cat=Cf) from the source code. This has caused a problem for Persian computing as U+200C ZERO WIDTH NON-JOINER is Cf and used in Persian text. Brandon Eich is working on changing the standard to not ignore formatting characters in string literals (and regexps probably too.) -- behdad http://behdad.org/ "Commandment Three says Do Not Kill, Amendment Two says Blood Will Spill" -- Dan Bern, "New American Language" -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
