Sergei Gorelkin <sergei_gorel...@mail.ru> hat am 2. November 2012 um 14:32 geschrieben: > 02.11.2012 17:08, Michael Van Canneyt пишет: > > > > > > On Fri, 2 Nov 2012, Andrew Brunner wrote: > > > >> > >> I think it would be a good solution and even prove faster in controlled > >> environments. Plus all > >> data is stored as widestrings in the DOM. > >> > >> The first question I have is if there was such an option would the patch be > >> accepted. > > > > I don't see how you can fix the problem. If the input is UTF8, and the > > result must be converted to a > > widestring for the DOM, then a conversion MUST take place, there is no way > > to avoid it. > > And a conversion means scanning the input byte for byte. > > > > In each case, the input must be scanned byte for byte anyway, to detect all > > the tags. That's what > > makes XML slow and unusable for large amount of data. > > > >> The next question is what is the problem with the uf8 routine that it left > >> the offending byte > >> sequence intact without converting the bytes in my sample data? > > > > Without error message, it is impossible to tell. > > > In this case, the issue is not encoding, but literal ESC (#27) code used in > data. XML specification > does not allow codepoints below 32, except TAB,CR and LF, to appear in data, > both in literal and > escaped forms.
Actually the specification only defines legal characters and that processors must accept them. It does not say what to do with the other characters. > In other words, XML is wrong technology to work with binary data, unless it is > encoded into textual > form (Base64 or alike). True. Mattias _______________________________________________ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel