On Tue, Sep 3, 2013 at 2:16 AM, David Jeske <[email protected]> wrote:
> On Mon, Sep 2, 2013 at 7:51 AM, Bennie Kloosteman <[email protected]>wrote: > >> Note the performance cost in C# strings str.SubString ( Indexof( >> lookupString) , length) requires creating a new string each time. We >> disussed a mutable ptr / length slice lookup previously ( even using a 64 >> bit pointer with the length in the high bits) which would be nice but it >> wont work with C# string as the array is private. >> > > This discussion is also mixing levels of abstraction. > > A CLR implementation *could* contain an efficient string-slice > implementation, without breaking compatibility for apps. Therefore WRT > BitC, we should be asking questions like: > Agree this is what we should be asking said the same in my last post . > > (a) is it a problem that string is implementing using runtime tricks which > are not avaiable to other apps? how can we make these capabilities > available? > Quite a few tricks here since string is crucial to the perfomance of teh language. Some of them cant change namely - GC hack to not mark - Create runtime determined length objects ( a CLR limitation) both of these hacks require sealed and the sealed creates more problems. No easy way around this AFAIK . Others could be improved upon - Use of c copy / compare routines due to SIMD . Which the mono jit doesnt produce. b) is it a problem that users can't author their own string-type-compatible string-slice type? If so, how should it be fixed? Well you can add a slice type to work with arrays .. what is needed - String does not returns an internal array it creates a copy .. This is because a) C# does not have the concept of immutable arrays and allowing direct access would make strings not immutable b) The internal storage is not an array . A slice implimentation would need support for value arrays and ref arrays ( though we possibly could just make ref arrays as an object with a value array ) - We need some sugar as taking a slice of string now is possible in unsafe ..( and such slices need to be immutable ..) - The standard lib needs to handle slice eg many string operations should work on string slice , array<T> on array<T> slice. It is very important that such slice types are in the base libraries so are commonly used. Actually String is just the data storage and you take a slice just like a borrowed pointer and just pass the slice around . That gives me an idea ,, string s1 = "Trick1"; stringSlice s2 = &s1 ; // sugar for take slice or &s1[1,4] or new stringSlice("Test", 1, 4) //or new stringSlice(byte[] , 1, 4) using the unsafe string initializer. We now have our slices as we can load string with UTF8 data. Programs dont really use string just slices . - It would be nice if slice was a reference with high bits masked for length , but i dont see this being compatible with a relocating GC , without the GC being aware of it ..best we can do is this.. nobox struct stringSlice //aka faststring { string s1; int offsetAndlength; // all the string methods. } We want mutable and immutable but mutable can only be taken from a char[] copy not string. On a possitive note we are no longer dependent on the standard string impl ( except for storage) . As for performance im concerned at creating 2 objects ( one should be embedded, stack or even registers) , it should be better when you take lots of Substrings and do the hard work .. ( stack allocation of small structs vs new strings). To get the heap saving it would need string slice to be used everywhere , libs standard runtime etc which i think is viable as long as we support most of the string methods. > > (c) is it a problem that a particular (open source) CLR (or BitC-VM) does > not implement efficient string-slice, if so, just fix it. > Its more a question of why they needed to use those hacks and how we can go around them. And what about non open source .. eg CLR on windows. Kind of a huge market to throw away ...and you really want to do bench marks in windows not mono ( especially if you want poly inline caches when the project is still small ) . That said i think for CLR you could alter the mscorlib CIL and remove sealed as part of a compile. Anyway wrapping up i think we can do FastString as string slices and you could even write such code now with unsafe. ( this phrase comes up a lot) . As far as things which cant be done .. - Variable size can be done by hijacking string - A lot of native is just the use of faster c routines and thats pretty hard to do without good SIMD . - slices and relocating GC Sorry about the sojourn i think it was worth it , strings are just so important in a runtime , at least for me .. :-) Im not great on type classes but if you want a volunteer for the standard lib i will help. Ben
_______________________________________________ bitc-dev mailing list [email protected] http://www.coyotos.org/mailman/listinfo/bitc-dev
