Re: [bitc-dev] GC, RC, and real time.

Bennie Kloosteman Sun, 13 Oct 2013 21:08:42 -0700

>
>
>
>
>>  I have never cared that much about  mem usage  150% or 200% min heap im
>> fine with and reduced pauses are a nice to have ( which regions and better
>> stack based algorithms will do anyway) .
>>
>
> One of the things that I find curious is that there are no measurements in
> the literature for malloc/free under constrained total heap size, and also
> that nobody talks about the overhead of the arena management structures. In
> most mallocs there is an 8 to 16 byte overhead per allocated object (on
> 32-bit systems).  Java object headers generally run 8 bytes. Where I'm
> going here is that the the required heap sizes may be closer than people
> think, simply because corresponding studies do not appear to have been done.
>


Agree + and there is more , eg the cost of SLUB etc to make it efficient
for malloc ( I dont see them changing the OS MM in many recent papers
either and SLUB is tuned for malloc but i think raw Buddy style allocators
work  better for GCs due to less but large 2 power allocations )  . Also
the programmer is often forgotten if he wants to write memory constrainted
algorithm after seeing or predicting memory is an issue then  he can and it
certainly will not be with huge amounts of new objects which changes the
whole profile for when is an issue. Embedded value types help a lot here .


>
>> But i have changed my position a bit  ,
>> 1) 10-20% just does not matter if you can express algortihms in SIMD  (
>> see my other ad nauseum posts)  and id much rather have basic RC as pointed
>> out and SIMD expression as part of the language than a full RC-immix
>> system. You get easier to write SIMD in the language and it can be called
>> by native programs and you have a game changer - you need not much else .
>>
>
> I think we all know by now that you are very excited about SIMD. I
> understand that mono isn't doing a good job, but it's not hard to do.
>


No safe implimentations so far id like to see a paper :-)  ..   im not
talking simple intrinsics ..im talking about expressing things is such a
way that the compiler generates SIMD. Obviously copying vecImp   should be
ok ( at least you save a lot of design )  but
- Can you do better expressing problems in a way that is easier to
vectorieze ( eg something like an expanded form or collection operations
like C# LINQ may help )
- Does it will work well for a safe languages . VecImp mentioned unsafe C
like language  , maybe they see safe as langueages with few valuye types
and hence SIMD is more difficult but the emphasize on unsafe worries me
even though i can see no reason for it,
- One of the biggest pains of  SIMD is getting the data into SIMD registers
,  doing more normal work  in SIMD registers and loading them effifiently
is an "art" at present.  ( Obviously the more you have in SIMD registers
already the cheaper it is to load them , loading them from normal
 registers sequentially is not efficient )

Im not a language guy ( at best learning ) but it doesnt seem that easy
designing this as we really dont know how to express SIMD algortihms well
or more importantly algorithms in a form that may be used for SIMD.

If its easy we should get it out there ;-)... people are desperate for  a
better way of writing portable safe SIMD insstead of intrinsics/ inline asm
you dont need much else.  Functions some data types , memory safety and
interface with C ( both ways)  .


>
>>  2)  the cost of concurrency is shocking me.. and i dont really see the
>> reasons behind it.  In languages with more pure functions it makes sense
>> but with Java or  C#  it makes little sense. You have mutable data and 90%
>> of the code will fail without a lock ...every class is marked in MSDN
>> whether its thread safe.
>>
>
> What numbers are you looking at that are shocking you? In languages with
> pure functions there is essentially *no* cost to concurrency, so I don't
> understand what you are saying there. In Java and C# the problem boils down
> to maintaining memory consistency at the hardware level, which is
> inherently expensive.
>


- eg for non concurrent code Simple RC  - without the lock and atomic op ,
RC basically becomes a conditional and store in the header (or pointer )
both of which  will be written to anyway at creation and often afterwards
 . I have seen simple single threaded smart pointers cost  10% vs malloc (
not vs a gc)  but not with smart_ptr ( 20%) and when doing it on multi
cores or multi CPU  the concurrent smart_ptrs degrade rapidly .
- Due to concurrent safety  bounds check elimination is  difficult , thats
a big cost.
- ARM has out of order instructions but then we may throw in memory
barriers/ fences for  single threaded code that MAY be concurrent.

You can scrape back some of these with much more complex algorithms which
incur further developend costs so there is a runtime complexity /
development cost.

For pure functional languages  ( or a high amount of pure ) they dont have
mutable data so you dont need to manually ensure the data is accessed in a
safe way. So it makes a lot of sense  ( in a backwards way because their
gurantees are cheap ), but when you have lots of mutable data the guarantee
loses IMHO most of its value because you need to ensure the code is
concurrent anyway ..

Obviously you need to throw in these guarantees for concurrent code and to
wite simpler locks etc  but id like a runtime / compiler gurantee that if
im running single threaded i do not pay the penalty :-)  .. LLVM does
nothing so its all up to the compiler creating the IR so it should not be
that hard..  if (singleThreaded) emit ... else  .



>
> So im thinking Its better to have structures in the standard lib which
>> help gurantee concurency not the runtime since you need locks and a
>> carefull design anyway.  Or at least tell the JIT to emit which. Maybe for
>> loops have for and foreach as well as concurrent foreach  and concurrent
>> for..
>>
>
> Implementation of concurrent algorithms has to be done in the language
> (possibly in the library). *Safety* of concurrency has to be done in the
> compiler/runtime.
>

I dont think the destinction is that clear ... Safety of concurrency  is
done by the compiler /runtime  if you guarantee safety , but if you have a
partial guarantee  then there could be interaction  and language features (
and possibly attributes or helpers in a lib)  .

 I supose the real gripe is not some safety  in the compiler/runtime but a
blanket gurantee for safety of concurrency . On x86 at least the gurantee
is worthless unless you are using correct algorithms and designs and they
dont help most people making thread safe objects as they normally use safe
functions or just slap a lock around so who are you helping is a small % of
already highly skilled people who write locks  or  non locxking concurrent
code  ..     ( On ARM its more usefull)

Not saying you cant have a whole lib /assembly marked safe since its multi
threaded but why should my single threaded or LMAX / Actor pattern  code be
burdened with concurrency safety when my competitors in Javascript ,C and
C++ are not.  Especially when the development cost is low.

This sort of design "Lightweight Concurrency Primitives for GHC" is of
interest as well and would allow adding say transactional memory though its
more for implementing the VM
http://research.microsoft.com/en-us/um/people/simonpj/papers/lw-conc/lw-conc.pdf

Ben

_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Re: [bitc-dev] GC, RC, and real time.

Reply via email to