[antlr-dev] UTF8 file/input stream? (C Runtime)

snowball Sat, 05 Dec 2009 09:46:19 -0800

Hey guys, I was wondering if anyone out there has a patch that implements
a UTF8 stream already, or if I need to write one myself.


In the latter case, I am a bit confused about the structure of the stream
objects.

The UCS2 input stream for instance sets the character width to 2, and can
of course use the ascii 8bit file stream directly without needing to
decode.

I was wondering if the character width variable is actually used anywhere,
or if I can safely set it to 0 and decode UTF8 on the fly.

The alternative, which is suggested in the header, is to decode the UTF8
to UCS4 when reading the file, and use a UCS4 stream. This is easily
doable, but it seems like a waste of memory. (decoding UTF8 is
NEGLIGIBLE!! performance wise, these days).

At any rate, if I am using UCS4 internally, the other issue is the string
factory class. I haven't looked much at it yet, but the uses of the
various methods in the string factory interface are confusing. It looks
like I need to implement a [my encoding] to [my encoding] function, a
'8bit' to [my encoding] and so on. I assume that '8bit' refers to ascii,
but it's not really clear. So I was just wondering if I could get some
clarification on exactly what these interfaces need to do, so that I can
implement one properly.

On the other hand, as mentioned at the beginning of this post, if a UTF8
stream has already been implemented, all I would like is a link to it!

Cheers.

_______________________________________________
antlr-dev mailing list
[email protected]
http://www.antlr.org/mailman/listinfo/antlr-dev

[antlr-dev] UTF8 file/input stream? (C Runtime)

Reply via email to