On Tue, Mar 22, 2011 at 6:46 PM, Ben Kloosterman <[email protected]> wrote: As I mentioned in C# the use of index leads to difficult to maintain code but its quite interesting that originally they expected people to use ToCharArray and possibly unsafe methods for lots of indexing... That's a bit of history that I had not known, and it's very useful. Its quite interesting how it evolved especially the way its used ..maybe worth a thread with some other languages. Unsafe and arrays was expected to be used often but is very rarely used , almost from the start most people used collections ArrayList and later the Generic typesafe List ( its quite nice how its easy to substitute collections when needed ) . LINQ is becoming very popular and people eventually seem to gravitate to the inversion of control pattern ( which is almost an anti pattern but helps simplify larger apps) . ...but they added the indexing to string and the end result is people just use string and the performance is normally always good enough. Umm. Ben? Any chance that this is because they defined char as UCS16 and in practice always implement String using the same heap data structure that Vector<ucs16> uses? That is: indexing performance is good enough because (a) it is constant time, and (b) it's semantics is broken in exactly the way we are hoping to avoid. Indexing is rarely used , its now a case people use Replace a LOT even when modifying a single char which since its returns a new string is inefficient but it doesn't matter in the scheme of things. String.Format is also popular and hence the most important operations by far are Find ( used by Replace) and Concatenations. Older code did more indexing but machines are so fast now its rarely needed and indexing results in lots of nasty bugs , due to foreign language issues ( 4 byte chars break) unexpected locations for the find and then calculations for offsets going awry. For BitC though we have a more extreme performance requirement but if we deny a char index ( say we only support an index returning a string) people will use arrays /vectors more for such work (we also have the issue that any char we return must be USC-4 and hence frequently require conversion) I don't know whether our performance issues are more extreme or not. In C# some of these things like the XML parser and maybe even the regular expression object are written in C .. C# is a user application language which does stretch to OS and drivers but shows strains when it does. Maybe we're trying to solve the wrong end of the problem here. How hard can it really be to get China to officially adopt a Western language? :-) .NET interop shouldn't be an issue its just a UTF16 string... When I looked at this a year ago, I was appalled to learn that this just isn't true. .NET strings are straddling the fence like crazy. While the string representation is not defined, all implementations of .NET use vector<UCS16>. Character indexing on a String returns a UCS16 unit that may or may not be a well-formed Unicode code point. Correct but this does not concern us .BitC will output a UTF-16 string and in BitC we index it the BitC way and in C# they index it the poor .NET way . All of the interesting string->string operations, however, are now attempting to operate on code points. Substring is a case in point. In classic Microsoft form, the error conditions when these operations get handed bad input are not really specified. Correct the use of index is just bad programming for most user apps which is why im sort of saying to remove it to make the barrier harder and force Developers to use arrays if they want it. Ben
_______________________________________________ bitc-dev mailing list [email protected] http://www.coyotos.org/mailman/listinfo/bitc-dev
