Tatu Saloranta wrote:

...
I don't see the need to call 3 accessor methods to get
the raw char array as a significant performance block
-- it certainly does not even register on profiles I
have taken for parsing.
I didn't mean to suggest that the 3 method calls caused a performance problem - it's just somewhat awkward, and giving access to the underlying parser buffer is not very clean from a structure standpoint. The big potential advantage I see with a CharSequence-type approach is that it would allow the parser to avoid translating data to a char[] in the first place, instead returning characters directly from the byte stream input (or internal byte[]). For UTF-8 and UTF-16 this would be very easy to implement - it's not so easy for some other character encodings, but those could be handled by the current approach of translating everything to chars up front.

The only remaining area where non-shared Strings are
used are attribute values; and here DTD/schema-based
handling might allow sharing too (for enumerated
types). Or, for minor improvements, type-based
accessors could be used too. If there's interest, I
could experiment with Woodstox stax-parser -- adding
low-level typed accessors would be quite easy to do,
and would avoid String creation.
It'd be interesting to see how much parsing speeds up if you disable the creation of Strings for attributes. Maybe that's an easy test you could try?

As to typed accessors, I think they'd be somewhat useful but I expect they'd also be a lot of trouble. The main benefit I see is that they would make it simpler to substitute a binary data decoder for the parser, and I'm not all that thrilled by the idea of pure binary data streams. The binary formats would be based on schemas, so in theory different implementations should translate the same schema to compatible formats - but we can't even get a reasonable level of compatibility in the use of schemas for web services with *text* documents, so how much more difficult would it be to do this with binary formats?

 - Dennis

Reply via email to