Hey guys, I was wondering if anyone out there has a patch that implements a UTF8 stream already, or if I need to write one myself.
In the latter case, I am a bit confused about the structure of the stream objects. The UCS2 input stream for instance sets the character width to 2, and can of course use the ascii 8bit file stream directly without needing to decode. I was wondering if the character width variable is actually used anywhere, or if I can safely set it to 0 and decode UTF8 on the fly. The alternative, which is suggested in the header, is to decode the UTF8 to UCS4 when reading the file, and use a UCS4 stream. This is easily doable, but it seems like a waste of memory. (decoding UTF8 is NEGLIGIBLE!! performance wise, these days). At any rate, if I am using UCS4 internally, the other issue is the string factory class. I haven't looked much at it yet, but the uses of the various methods in the string factory interface are confusing. It looks like I need to implement a [my encoding] to [my encoding] function, a '8bit' to [my encoding] and so on. I assume that '8bit' refers to ascii, but it's not really clear. So I was just wondering if I could get some clarification on exactly what these interfaces need to do, so that I can implement one properly. On the other hand, as mentioned at the beginning of this post, if a UTF8 stream has already been implemented, all I would like is a link to it! Cheers. _______________________________________________ antlr-dev mailing list [email protected] http://www.antlr.org/mailman/listinfo/antlr-dev
