On Wed, Mar 31, 2004 at 09:14:45AM +0200, elazar wrote:
> How can I convince flex and bison to do Hebrew (ISO-885908 or UTF8)
> lexing or parsing.
> The question is divided into two parts,
> 1) how can I make him recognize hebrew letters (so that flex won't
> spit them away telling me it's not a defined)
> 2) how can I represent hebrew text to flex (IE tokenize "\ABA")
IIUC bison/yacc has nothing to do with it. bison uses tokens from
(f)lex. The problem here is how to use Hebrew when defining those
tokens.
ISO-8859-8 should probably be simpler. I saw that flex's man page
mentions 8bit, so figure it is 8bit-clean.
UTF-8 is more complicated. I quote here the FAQ issues "Can I fake
multi-byte character support?" from flex's manual:
Flex has in it a widespread assumption that the input is processed
one byte at a time. Fixing this is on the to-do list, but is involved,
so it won't happen any time soon. In the interim, the best I can suggest
(unless you want to try fixing it yourself) is to write your rules in
terms of pairs of bytes, using definitions in the first section:
X \xfe\xc2
...
%%
foo{X}bar found_foo_fe_c2_bar();
etc. Definitely a pain - sorry about that.
--
Tzafrir Cohen +---------------------------+
http://www.technion.ac.il/~tzafrir/ |vim is a mutt's best friend|
mailto:[EMAIL PROTECTED] +---------------------------+
=================================================================
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word "unsubscribe" in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]