02.11.2012 17:08, Michael Van Canneyt пишет:
On Fri, 2 Nov 2012, Andrew Brunner wrote:
I think it would be a good solution and even prove faster in controlled
environments. Plus all
data is stored as widestrings in the DOM.
The first question I have is if there was such an option would the patch be
accepted.
I don't see how you can fix the problem. If the input is UTF8, and the result
must be converted to a
widestring for the DOM, then a conversion MUST take place, there is no way to
avoid it.
And a conversion means scanning the input byte for byte.
In each case, the input must be scanned byte for byte anyway, to detect all the
tags. That's what
makes XML slow and unusable for large amount of data.
The next question is what is the problem with the uf8 routine that it left the
offending byte
sequence intact without converting the bytes in my sample data?
Without error message, it is impossible to tell.
In this case, the issue is not encoding, but literal ESC (#27) code used in data. XML specification
does not allow codepoints below 32, except TAB,CR and LF, to appear in data, both in literal and
escaped forms.
In other words, XML is wrong technology to work with binary data, unless it is encoded into textual
form (Base64 or alike).
Regards,
Sergei
_______________________________________________
fpc-devel maillist - fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel