On Tue, Mar 22, 2011 at 6:46 PM, Ben Kloosterman <[email protected]> wrote:
> As I mentioned in C# the use of index leads to difficult to maintain code > but its quite interesting that originally they expected people to use > ToCharArray and possibly unsafe methods for lots of indexing... > That's a bit of history that I had not known, and it's very useful. > ...but they added the indexing to string and the end result is people > just use string and the performance is normally always good enough. > Umm. Ben? Any chance that this is because they defined char as UCS16 and in practice always implement String using the same heap data structure that Vector<ucs16> uses? That is: indexing performance is good enough because (a) it is constant time, and (b) it's semantics is broken in exactly the way we are hoping to avoid. For BitC though we have a more extreme performance requirement but if we > deny a char index ( say we only support an index returning a string) people > will use arrays /vectors more for such work (we also have the issue that > any char we return must be USC-4 and hence frequently require conversion) > I don't know whether our performance issues are more extreme or not. Maybe we're trying to solve the wrong end of the problem here. How hard can it really be to get China to officially adopt a Western language? :-) > .NET interop shouldn’t be an issue its just a UTF16 string... > When I looked at this a year ago, I was appalled to learn that this just isn't true. .NET strings are straddling the fence like crazy. While the string representation is not defined, all implementations of .NET use vector<UCS16>. Character indexing on a String returns a UCS16 unit that may or may not be a well-formed Unicode code point. All of the interesting string->string operations, however, are now attempting to operate on code points. Substring is a case in point. In classic Microsoft form, the error conditions when these operations get handed bad input are not really specified. The definition of a .Net string does not, in fact, guarantee that the sequence of UCS16 code units constitute a well-formed code point sequence and there are quite a number of operations whose defined error checks are insufficient to guarantee the code point well-formedness of Strings. Notwithstanding this hole in the specification, there are many *other* parts of the specification that appear to rely implicitly on the String well-formedness constraint. Of course, to extract those statements I had to scan about a billion pages of Microsoft Standardese. It's altogether possible that I missed something somewhere... shap
_______________________________________________ bitc-dev mailing list [email protected] http://www.coyotos.org/mailman/listinfo/bitc-dev
