On Tue, Sep 3, 2013 at 2:16 AM, David Jeske <[email protected]> wrote:

> On Mon, Sep 2, 2013 at 7:51 AM, Bennie Kloosteman <[email protected]>wrote:
>
>> Note the performance cost in C# strings str.SubString ( Indexof(
>> lookupString) , length)  requires creating a new string each time.  We
>> disussed a mutable ptr / length slice lookup previously ( even using a 64
>> bit pointer with the length in the high bits)   which would be nice but it
>> wont work with C# string as  the array is private.
>>
>
> This discussion is also mixing levels of abstraction.
>
> A CLR implementation *could* contain an efficient string-slice
> implementation, without breaking compatibility for apps. Therefore WRT
> BitC, we should be asking questions like:
>

Agree this is what we should be asking said the same in my last post .


>
> (a) is it a problem that string is implementing using runtime tricks which
> are not avaiable to other apps? how can we make these capabilities
> available?
>

Quite a few tricks here since string is crucial to the perfomance of teh
language.

Some of them cant change namely
    - GC hack to not mark
   -  Create runtime determined length objects  ( a CLR limitation)

both of these hacks require sealed and the sealed creates more problems.
 No easy way around this AFAIK .

Others could be improved upon
   - Use of c copy / compare routines due to SIMD . Which the mono jit
doesnt produce.

b) is it a problem that users can't author their own string-type-compatible
string-slice type? If so, how should it be fixed?

Well you can add a slice type to work with arrays .. what is needed

- String does not returns an internal array  it creates a copy .. This is
because
                             a) C# does not have the concept of immutable
arrays and allowing direct access  would make strings not immutable
                             b)  The internal storage is not an array . A
slice implimentation would need support for value arrays and ref arrays  (
though we possibly could just make ref arrays as an object with a value
 array )
- We need some sugar as taking a slice of string now is possible in unsafe
..( and  such slices need to be immutable ..)
- The standard lib needs to handle slice eg many string operations should
work on string slice , array<T> on array<T>  slice.  It is very important
that such slice types are in the base libraries so are commonly used.
   Actually String is just the data storage and you take a slice just like
a borrowed pointer and just pass the slice around .  That gives me an idea
,,

string s1 = "Trick1";
stringSlice s2 = &s1 ; // sugar for take slice  or &s1[1,4] or  new
stringSlice("Test", 1, 4)
//or  new stringSlice(byte[] , 1, 4)  using the unsafe string initializer.

We now have our slices as we can load string with UTF8 data. Programs dont
really use string just slices .

-  It would be nice if slice was a reference with high bits masked
for length , but i dont see this being compatible with a relocating GC
 , without the GC being aware of it  ..best we can do is this..


nobox struct  stringSlice //aka faststring {

  string s1;

  int offsetAndlength;

   // all the string methods.

}


We want mutable and immutable but mutable can only be taken from a
char[] copy not string.


On a possitive note we are no longer dependent on the standard string impl
 ( except for storage) . As  for performance im concerned at creating 2
objects ( one should be embedded, stack or even registers) , it should be
better when you take lots of Substrings and do the hard work .. ( stack
allocation of small structs vs new strings).

To get the heap saving it would need string slice to be used everywhere ,
libs standard runtime etc which i think is viable as long as we support
most of the string methods.


>
> (c) is it a problem that a particular (open source) CLR (or BitC-VM) does
> not implement efficient string-slice, if so, just fix it.
>

Its more a question of why they needed to use those hacks and how we can go
around them.  And what about non open source .. eg CLR on windows. Kind of
a huge market to throw away ...and you really want to do bench marks in
windows not mono ( especially if you want poly inline caches when the
project is still small )  . That said i think for CLR you could alter the
mscorlib CIL and remove sealed as part of a compile.


Anyway wrapping up  i think we can do FastString as string slices and you
could even write such code now with unsafe.  ( this phrase comes up a lot)
. As far as things which cant be done ..
- Variable size can be done by hijacking string
- A lot of native is just the use of faster c   routines and thats pretty
hard to do without good SIMD  .
- slices and relocating GC

Sorry about the sojourn i think it was worth it  , strings are just so
important  in a runtime  , at least for me .. :-)  Im not great on type
classes but if you want a volunteer for the standard lib i will help.

Ben
_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Reply via email to