On Fri, 5 Jul 2002 17:22:57 -0700 "Arun v" <[EMAIL PROTECTED]> wrote:
> Hi > > Im a newbie to this unicode world. > Im developing a EcmaScript Interpreter according to ECma 262 Standard,which > states the input given to the ecma interpreter > will be in UTF-16(normalised to Unicode Normalised form C) transformation > format. > > I have an C program in Linux(which acts as scanner for the interpreter),now > I wanna make it aware of UTF-16 transformed > input.(I need not do any transformation or normalisation but make my > program understand the UTF-16 Encoding) Do the identifiers and variable names of ECMAScript itself need to be in Unicode or just the Strings? If it's just the strings then then just decode each in your tokenizer (or I suppose you could build up each string in your state machine). Then you will need basic string operators for UTF-16. > P.N : also suggest me some good online resource of Unicode and UTF-16 See this FAQ. It's focus is UTF-8 on Unix but UTF-16 uses the same principle and will be a good jumping off point. http://www.cl.cam.ac.uk/~mgk25/unicode.html Mike -- http://www.eskimo.com/~miallen/c/jus.c -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
