So here, concretely, is what I'm contemplating: 1. Strings will have unspecified internal representation, but well-formed strings will contain code points, not code units. 2. There will be some form of string accessor object or set of functions. The general idea is that you specify the index of the first desired code point and the length, and get back a StringAccessor containing those code points. In most use cases returning a sub string ( even a char at an index) is sufficient even if one char .. I think the accessor is good but is the usefulness gap between getting a substring and getting the underlying array and using that justified ( I think so )., As I mentioned in C# the use of index leads to difficult to maintain code but its quite interesting that originally they expected people to use ToCharArray and possibly unsafe methods for lots of indexing but they added the indexing to string and the end result is people just use string and the performance is normally always good enough. For BitC though we have a more extreme performance requirement but if we deny a char index ( say we only support an index returning a string) people will use arrays /vectors more for such work (we also have the issue that any char we return must be USC-4 and hence frequently require conversion) 3. The string accessor notion can be extended to other things such as code units. The question then becomes: why shouldn't we simply decide that vectors make perfectly fine holders of code units, and Strings are the means for representing code points? The only objection I can think of is that this is not (regrettably) how various native string representations operate (e.g. .Net). This was the original 2 string types I suggested as its lightweight and simple and achieves both goals ( fast array and a programmer friendly string) .This would consist of a char array which is discouraged , but there are a few functions that use it and a widely used string ( which may wrap the char array and has a toArray method) . The argument at the time against this was polluting the lib with 2 different string types , but the reality is they have very different roles. Your not going to see many methods on the char array as its mainly do it yourself stuff. .NET interop shouldn't be an issue its just a UTF16 string and the strange indexing are not our problem we should get string as an opaque type to use UTF16 internally if the program uses a lot of interop. Note here again disallowing char ch = string[x] but allowing str ch = string.getSub(x , 1) or StringAccessor prevents any issues with people trying to use .NET strings like .NET does in BitC . Ben
_______________________________________________ bitc-dev mailing list [email protected] http://www.coyotos.org/mailman/listinfo/bitc-dev
