On Mon, 2006-09-04 at 22:19 -0400, Mark Leisher wrote:
> Though I haven't checked myself, I wouldn't be surprised if Perl,
> Python, PHP, and a host of other programming languages weren't already
> doing this, making your concerns pointless. You would probably find it
> instructive to look at some lexical scanners. 

To add a sidenote to this otherwise pointless conversation, the ECMA
Script (aka Javascript) standard actually ignores all format characters
(gen-cat=Cf) from the source code.  This has caused a problem for
Persian computing as U+200C ZERO WIDTH NON-JOINER is Cf and used in
Persian text.  Brandon Eich is working on changing the standard to not
ignore formatting characters in string literals (and regexps probably
too.)

-- 
behdad
http://behdad.org/

"Commandment Three says Do Not Kill, Amendment Two says Blood Will Spill"
        -- Dan Bern, "New American Language"


--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to