On Tue, Oct 12, 2010 at 9:47 PM, William Leslie <
[email protected]> wrote:

> How can we attribute the performance difference between these xml
> parsers to encoding? Where are the benchmarks?
>
> Memory usage of strings probably isn't as important as you think...


I think this is incorrect. In UNIX programs circa 1990, 20% of live
in-memory data on workstations was character string data. By 2000, that
number was closer to 60%. The proportion on servers is much higher. So size
of character representation matters both for memory usage reasons and for
cache bandwidth reasons - the latter probably more compelling than the
former.


> - for
> large strings, you are probably more interested in using a stream
> decoder then a great big in-memory string, and if that doesn't suit
> your use case, you probably want to implement your own string type,
> whether that be ropes or an array in utf-8 or whatever.
>

Possibly, and perhaps, but if so then you aren't concerned about the native
string representation.


> For typical in-memory string manipulation, UCS-2 has served us well,
> and people usually work under the assumption that indexing or slicing
> a string by index-of-codepoint is O(1) (even if the strings resulting
> from the slice may not be valid). I think it is a useful assumption,
> and that programmers will continue to want cheap slices based on a
> vague if sometimes incorrect count of characters for the time being.
>

Perhaps more to the point, if you *don't* have this you have to give up
either the XML content model or the XPath indexing model, neither of which
is optional in practice...


shap
_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Reply via email to