On SIMD, halide is interesting: halide-lang.org. But probably too specialised
for bitc.
Alex
From: Bennie Kloosteman <[email protected]>
>To: Discussions about the BitC language <[email protected]>
>Sent: Monday, October 14, 2013 5:07 AM
>Subject: Re: [bitc-dev] GC, RC, and real time.
>
>
>
>
>>
>>
>>
>>I have never cared that much about mem usage 150% or 200% min heap im fine
>>with and reduced pauses are a nice to have ( which regions and better stack
>>based algorithms will do anyway) .
>>
>>
>>One of the things that I find curious is that there are no measurements in
>>the literature for malloc/free under constrained total heap size, and also
>>that nobody talks about the overhead of the arena management structures. In
>>most mallocs there is an 8 to 16 byte overhead per allocated object (on
>>32-bit systems). Java object headers generally run 8 bytes. Where I'm going
>>here is that the the required heap sizes may be closer than people think,
>>simply because corresponding studies do not appear to have been done.
>
>
>Agree + and there is more , eg the cost of SLUB etc to make it efficient for
>malloc ( I dont see them changing the OS MM in many recent papers either and
>SLUB is tuned for malloc but i think raw Buddy style allocators work better
>for GCs due to less but large 2 power allocations ) . Also the programmer is
>often forgotten if he wants to write memory constrainted algorithm after
>seeing or predicting memory is an issue then he can and it certainly will not
>be with huge amounts of new objects which changes the whole profile for when
>is an issue. Embedded value types help a lot here .
>
>
>
>>
>>But i have changed my position a bit ,
>>>
>>>1) 10-20% just does not matter if you can express algortihms in SIMD ( see
>>>my other ad nauseum posts) and id much rather have basic RC as pointed out
>>>and SIMD expression as part of the language than a full RC-immix system. You
>>>get easier to write SIMD in the language and it can be called by native
>>>programs and you have a game changer - you need not much else .
>>
>>
>>I think we all know by now that you are very excited about SIMD. I understand
>>that mono isn't doing a good job, but it's not hard to do.
>
>
>
>
>No safe implimentations so far id like to see a paper :-) .. im not talking
>simple intrinsics ..im talking about expressing things is such a way that the
>compiler generates SIMD. Obviously copying vecImp should be ok ( at least
>you save a lot of design ) but
>- Can you do better expressing problems in a way that is easier to vectorieze
>( eg something like an expanded form or collection operations like C# LINQ may
>help )
>- Does it will work well for a safe languages . VecImp mentioned unsafe C like
>language , maybe they see safe as langueages with few valuye types and hence
>SIMD is more difficult but the emphasize on unsafe worries me even though i
>can see no reason for it,
>- One of the biggest pains of SIMD is getting the data into SIMD registers ,
>doing more normal work in SIMD registers and loading them effifiently is an
>"art" at present. ( Obviously the more you have in SIMD registers already the
>cheaper it is to load them , loading them from normal registers sequentially
>is not efficient )
>
>
>Im not a language guy ( at best learning ) but it doesnt seem that easy
>designing this as we really dont know how to express SIMD algortihms well or
>more importantly algorithms in a form that may be used for SIMD.
>
>
>If its easy we should get it out there ;-)... people are desperate for a
>better way of writing portable safe SIMD insstead of intrinsics/ inline asm
>you dont need much else. Functions some data types , memory safety and
>interface with C ( both ways) .
>
>
>
>>
>>2) the cost of concurrency is shocking me.. and i dont really see the
>>reasons behind it. In languages with more pure functions it makes sense but
>>with Java or C# it makes little sense. You have mutable data and 90% of the
>>code will fail without a lock ...every class is marked in MSDN whether its
>>thread safe.
>>
>>
>>What numbers are you looking at that are shocking you? In languages with pure
>>functions there is essentially no cost to concurrency, so I don't understand
>>what you are saying there. In Java and C# the problem boils down to
>>maintaining memory consistency at the hardware level, which is inherently
>>expensive.
>
>
>
>
>- eg for non concurrent code Simple RC - without the lock and atomic op , RC
>basically becomes a conditional and store in the header (or pointer ) both of
>which will be written to anyway at creation and often afterwards . I have
>seen simple single threaded smart pointers cost 10% vs malloc ( not vs a gc)
>but not with smart_ptr ( 20%) and when doing it on multi cores or multi CPU
>the concurrent smart_ptrs degrade rapidly .
>- Due to concurrent safety bounds check elimination is difficult , thats a
>big cost.
>
>- ARM has out of order instructions but then we may throw in memory barriers/
>fences for single threaded code that MAY be concurrent.
>
>
>
>You can scrape back some of these with much more complex algorithms which
>incur further developend costs so there is a runtime complexity / development
>cost.
>
>
>
>For pure functional languages ( or a high amount of pure ) they dont have
>mutable data so you dont need to manually ensure the data is accessed in a
>safe way. So it makes a lot of sense ( in a backwards way because their
>gurantees are cheap ), but when you have lots of mutable data the guarantee
>loses IMHO most of its value because you need to ensure the code is concurrent
>anyway ..
>
>
>Obviously you need to throw in these guarantees for concurrent code and to
>wite simpler locks etc but id like a runtime / compiler gurantee that if im
>running single threaded i do not pay the penalty :-) .. LLVM does nothing so
>its all up to the compiler creating the IR so it should not be that hard.. if
>(singleThreaded) emit ... else .
>
>
>
>
>>
>>So im thinking Its better to have structures in the standard lib which help
>>gurantee concurency not the runtime since you need locks and a carefull
>>design anyway. Or at least tell the JIT to emit which. Maybe for loops have
>>for and foreach as well as concurrent foreach and concurrent for..
>>
>>
>>Implementation of concurrent algorithms has to be done in the language
>>(possibly in the library). Safety of concurrency has to be done in the
>>compiler/runtime.
>
>
>I dont think the destinction is that clear ... Safety of concurrency is done
>by the compiler /runtime if you guarantee safety , but if you have a partial
>guarantee then there could be interaction and language features ( and
>possibly attributes or helpers in a lib) .
>
>
> I supose the real gripe is not some safety in the compiler/runtime but a
>blanket gurantee for safety of concurrency . On x86 at least the gurantee is
>worthless unless you are using correct algorithms and designs and they dont
>help most people making thread safe objects as they normally use safe
>functions or just slap a lock around so who are you helping is a small % of
>already highly skilled people who write locks or non locxking concurrent
>code .. ( On ARM its more usefull)
>
>
>Not saying you cant have a whole lib /assembly marked safe since its multi
>threaded but why should my single threaded or LMAX / Actor pattern code be
>burdened with concurrency safety when my competitors in Javascript ,C and C++
>are not. Especially when the development cost is low.
>
>
>This sort of design "Lightweight Concurrency Primitives for GHC" is of
>interest as well and would allow adding say transactional memory though its
>more for implementing the VM
>
>http://research.microsoft.com/en-us/um/people/simonpj/papers/lw-conc/lw-conc.pdf
>
>
>
>
>
>Ben
>
>
>_______________________________________________
>bitc-dev mailing list
>[email protected]
>http://www.coyotos.org/mailman/listinfo/bitc-dev
>
>
>
_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev