On Tue, Oct 12, 2010 at 9:47 PM, William Leslie < [email protected]> wrote:
> How can we attribute the performance difference between these xml > parsers to encoding? Where are the benchmarks? > > Memory usage of strings probably isn't as important as you think... I think this is incorrect. In UNIX programs circa 1990, 20% of live in-memory data on workstations was character string data. By 2000, that number was closer to 60%. The proportion on servers is much higher. So size of character representation matters both for memory usage reasons and for cache bandwidth reasons - the latter probably more compelling than the former. > - for > large strings, you are probably more interested in using a stream > decoder then a great big in-memory string, and if that doesn't suit > your use case, you probably want to implement your own string type, > whether that be ropes or an array in utf-8 or whatever. > Possibly, and perhaps, but if so then you aren't concerned about the native string representation. > For typical in-memory string manipulation, UCS-2 has served us well, > and people usually work under the assumption that indexing or slicing > a string by index-of-codepoint is O(1) (even if the strings resulting > from the slice may not be valid). I think it is a useful assumption, > and that programmers will continue to want cheap slices based on a > vague if sometimes incorrect count of characters for the time being. > Perhaps more to the point, if you *don't* have this you have to give up either the XML content model or the XPath indexing model, neither of which is optional in practice... shap
_______________________________________________ bitc-dev mailing list [email protected] http://www.coyotos.org/mailman/listinfo/bitc-dev
