Re: [bitc-dev] GC, RC, and real time.

Bennie Kloosteman Wed, 16 Oct 2013 10:32:34 -0700

On Wed, Oct 16, 2013 at 2:28 AM, Jonathan S. Shapiro <[email protected]>wrote:

> On Tue, Oct 15, 2013 at 11:03 AM, Bennie Kloosteman <[email protected]>wrote:
>
>> On Tue, Oct 15, 2013 at 10:53 PM, Jonathan S. Shapiro 
>> <[email protected]>wrote:
>>
>>>
>>>  A value type can live on the heap, and when it does, it will be
>>> "wrapped" by a conventional object header. If you like, you can imagine
>>> that for every value type V there is a corresponding reference type V_ref,
>>> and the two are assignment compatible by dispensation.
>>>
>>
>> Yep but that would be poor design in a ref counted language , you would
>> often put many in a single object , try to use regions and stack more etc..
>>
>
> Umm. I didn't mean to suggest that *interior* value type instances would
> get headers added. Only the outermost. And in a reference counting design,
> that seems to be the only option in designs where boxed objects get a
> header at all.
>

Im not too concerned about boxed objects ( only that we can explicitly deny
or warn on boxing)  , what im saying is we will get far more interior value
type objects  than say java which only has them on base types  or C# even .
  We will get ( for high perfomance but not maintenance) larger objects
composed of many of these interior values. ( Which i know makes headers
less of an issue :-) )

BTW Do we have a better name than value types as it doesnt mean they are
always copied by value .. they are really subobjects without a header.

But you do (implicitly) raise the problem of whether interior references
> can be supported. That's a big can of worms, but it seems to be orthogonal
> to the GC vs. RC choice. That's something we should take up separately, I
> think.
>
>
>> Agree imutable is important and also note string implimentation  as an
>> array reference vs an embeded char[]. Ref counting has huge implications on
>> design ..its going to be hard getting a standard lib that works well with
>> ref counting and a GC.
>>
>
> I'm not sure why ref counting should have any impact on string
> implementation. The question of whether the string payload immediately
> follows the header or is stored separately has more to do with relocation
> concerns than with GC/RC in my mind. Above a certain size, you really don't
> want the string payload stored contiguously.
>
> Can you explain what you see as the relationship between GC/RC and string
> design?
>

With RC  every time you have a string reference the algorithm needs to put
the internal char array object reference on the stack to work on it  ( note
i used the term array reference ) .. that introduces a count .  With an
embeded array your really looking at string[0].

Now if  the internal array is a seperatly allocated value type then  its
not an issue as its really a pointer not a reference ( and leads into
whether internal referances can be supported or wthether they are really
pointers or reference offsets ) ,

The key point though is it becomes much worse to to have an object store
another object hence RC has an impact on design . WHich IMHO will make it
preform better as the libs mature .

>
>
>>
>>
>>
>>>  When i wrote that i was thinking reference could be  11-12 bytes
>>>> structures with the pointer  ( or possibly masked high bits in a 64 bit
>>>> pointer  or 32 bit pointer as an option on a 64 bit machine)  and some
>>>> flags (freeze / release immutable ) and a counter so you dont have to have
>>>> a header.
>>>>
>>>
>>> Adding size to the reference is far worse than having an object header.
>>> Unless you can play mapping games to implement lazy masking, masking is
>>> also expensive. On some machines you can used the VM system for masking,
>>> because some virtual caches aren't physically anti-aliased.
>>>
>>
>> Im not convinced of that ...ref counting on C++ does exactly that  and
>> the basic implimentation C++ is much faster than Java using bits in the
>> header( yet alone a new header field)  .
>>
>
> Measurements please, because this is *incredibly* counter-intuitive, and
> seems contrary to every hard measurement I know about concerning L1 cache
> misses. Unless the implementation has changed a lot since I last looked,
> the performance of C++ reference counting pointers is truly awful, and the
> size penalty of using them is pretty significant.
>

smart_ptr do more and are slow   ...  I dont have measurement but i do know
about 2 years ago i wacked up a simple interlock and added it to the
pointer and it was a bit over 10%  ( which could be 15) , cant find a paper
only a micro bench which shows 13% compared to malloc
http://www.codeproject.com/Articles/1648/The-fastest-smart-pointer-in-the-west
( this is just an interlock )   . Trivial java implimentation adding a
field to the header before modern techniques were measured at somethign
like 30% compared to MMtk .. . And in the C++ world 13% is aweful  but they
dont have any of the fancy techniques they used in Java to pull it down
from 30-10%  and 3% or so for rc-immix ! So an apples to apples comparison
with header might be 15% to 30%... again this is complicated by the fact
that in Java everything except a few basic types is an object and may be
better with value types.

Im also saying we should measure it  ... We do know the cost of headers is
also very significant. And in my experience on modern hw asumptions are
often wrong trading memory for CPU is nearly always good and improves
overal cache performance  ( which is an asumption :-)  )

>  I think the cost of the object header is greater than we think...
>>
>
> Maybe so. But a significant number of objects do have in-degree higher
> than one, and for those objects it is much better to use a one-word header
> than an additional word per pointer. There are also interop issues when fat
> pointers are used.
>

Its not as clear as that IF  70-80% of objects have no reference  eg
stirngs have no reference and objects holding just int , matrix , points ,
strings themselves etc  You would need an average degree of 5 to equal this
 and this is not the case.. Now if 50% are interior values or value objects
on stacks and regions the figure changes to needing an average degree of
2.5.  On the other hand if 10% are interior value  you are looking at 4.5
and IMHO the pointer + field may be better . We dont know this figure but
it makes a significant diffirence.

for your own system yes you can add 1  64 bit  or maybe even 32 bit . But
the header is determined by the runtime  on 64 bit CLR thats 16 bytes  .,
 Jikes is also 16 ytes . Now these runtimes  carry this burden when
compared to C and C++..which is likely a significant factor in the
performance to native .. So a CLR bitc will carry this cost  in any
benchmark against C . And we now have some figure on the header cost. Part
of this does not directly relate to Bitc just my shock at the measured cost
they wear for a 12 (64 bit JVM) - 16 byte header (CLR/Jikes) . If you
extrapolate 3% per 32 bits on 64 bit Jikes thats 12% + header management
costs . Is this most of the managed cost , not the GC  but the GC needing a
header ? Or is it poor design of the run times.

Finally, I'm not at all convinced that the per-object header can be
> eliminated when fat pointers are used. You only have to need a single bit
> for object forwarding to require a full word of object header.
>

Maybe ...there are other ways .. and its a bonus for URC not needing it .

>
> Im pretty sure the fastest will be 32 bit pointer on 64 bit machine ..
>>
>
> I assume you mean to be filtering out the high 32 bits? Then you might be
> surprised. The reduction in D-cache utilization from this is pretty serious.
>

Not filtering just an indexed 32 bit load instead of a 64 bit load  at +0
or +4 ( depending on whether you want the reference or counts and flags) .
Program obviously limited to 4G ( for large memory run large mode eg add
field or mask  )  .

I dont see this affecting D-cache ( it should improve it as there is no
header and the same size pointer )  .. For extra fields which you meant I
 think its hard to make a judgement on D-cache utilization  maybe you know
better but loading the header to update teh count to me means its similar .
I think it will be proportional to overall memory usage change . You can
say an objet with lots of references will have worse D- cache but string
for example would be better  as there is no reference and no header...

Obviously the ponter wouldnt run on any common/ existing VM  but has  a
 low cost , would be more competative with C++ and would cover probably 80%
of implimentations soon  ( asuming most phones follow iphones into 64 bit)
. You then have 32 bit , 64 bit and 64 bit large mode ( which has an extra
field)  . If the header cost is 7-10%  you would get most of that  back on
64 bit. ( though  i think it will be much less due to embedded value types
and regions)

Interop is worth considering   , 2 apps on the same runtime is not an issue
but  for native to runtime  this is not trivial , if the increment is done
in the ptr then a c header can do the same thing mask or extra fields ..(
and it will work with URC it just bypases the nursery)  but everything
needs to be a reference eg *bitc_ref or maybe even *bitc_ref<T>. If your
going with rc-immix then you need 2 functions for the c app to call from
the header instead of   increment/decrement you can also pass raw bitc
pointers unmodified but then you need to trust the client and seems a
dangerous/ bad option    . I dont see a huge diffirence but it needs more
thought that the 1 minute i just gave it  ...

>
>
>> So whats the cost of 2 32 bit fields + the header overhead itself 7-10%
>>  ?
>>
>
> Fair question, except that I don't see any use case for more than one word
> of object header at this point (unless you count the vtable pointer as part
> of the header, which I do not).
>

2 32 bit + header cost  is one word on 64 bit ... its a memory
 allocation/copy cost so its probably a fraction less than 7-10% but not
half .   Maybe you could use a 32 bit headers on 64 bit machine , if its
just type and the count bits its ok but the allignment and fragmentation
and wasted space alligning objects may ofset this. The runtime builders
arent stupid ( I hope) and they would have tested more packed headers (
which they use in some environments)

.

> What do you imagine lives in that 3-word header?
>

All you need is type  and some flags/ counts .

Ben

_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Re: [bitc-dev] GC, RC, and real time.

Reply via email to