I really want to see that paper, because it really sounds like their
implementation was no good. Here are the things I need to know:

   1. What was their stack chunk size?
   2. Why did they do the checkpoints this way?

If you are willing to use a guard page, the checkpoint can be done in
*zero* marginal
instructions. You have to zero the stack frame in any case, so you simply
make the first instruction of the procedure zero out the deepest point of
the call frame that you will need. If that puts you in the guard page, you
get a page fault and you deal with it.

It also sounds a bit like they weren't using stack chunks the right way.
The point of a stack chunk is to have a way to grow the stack if you need
to. It's not a tool for GC when used sensibly.


shap


On Mon, Nov 4, 2013 at 12:52 AM, Ben Kloosterman <[email protected]> wrote:

> As far as segment stacks go i posted this from a paper   on the rust list
> . This is from getting Apache to use segmented stacks in C.
>
> " At 0.1% of call sites, checkpoints
> caused a new stack chunk to be linked, at a cost of 27
> instructions. At 0.4–0.5% of call sites, a large stack chunk
> was linked unconditionally in order to handle an external
> function, costing 20 instructions. At 10% of call sites, a
> checkpoint determined that a new chunk was not required,
> which cost 6 instructions. The remaining 89% of call sites
> were unaffected. Assuming all instructions are roughly equal
> in cost, the result is a 71–73% slowdown when considering
> function calls alone. Since call instructions make up only
> 5% of the program’s instructions, the overall slowdown is
> approximately 3% to 4%"
>
> Which also means that if you instrument every method your cost could well
> be in the 20-30% overall range . So without significant analysis
> (prefferably  whole program)  its not really viable , 3-4% is a fair price
> to pay if you can do such analysis especially with high thread counts ( as
> you win some back due to a leaving more memory for the heap) .
>
> Ben
>
>
>
>
>
>
> On Fri, Nov 1, 2013 at 10:16 PM, Jonathan S. Shapiro <[email protected]>wrote:
>
>> On Thu, Oct 31, 2013 at 6:15 PM, Ben Kloosterman <[email protected]>wrote:
>>
>>>
>>>> I'm talking about allocating a large array of some particular object
>>>> type and then doling out interior pointers as the objects are allocated.
>>>> It's an array of objects, not an array of allocators.
>>>>
>>>> What do you get by using value types and non boxed array?
>>>
>>
>> The ability to control placement and locality.
>>
>>
>>>  ... Linux kernel is whole program and very few apps will reach that
>>>>> scale.
>>>>>
>>>>
>>>> Linux is not compiled whole program. It is compiled one source unit at
>>>> a time.
>>>>
>>>>
>>>
>>> Pretty sure you could in theory compile all those files to LLVM IR  ( if
>>> not for some not c compliant code) .
>>>
>>
>> Not in reality. Lots of dynamic loading in the linux kernel. But in any
>> case, that's not how it's done.
>>
>>
>> shap
>>
>> _______________________________________________
>> bitc-dev mailing list
>> [email protected]
>> http://www.coyotos.org/mailman/listinfo/bitc-dev
>>
>>
>
> _______________________________________________
> bitc-dev mailing list
> [email protected]
> http://www.coyotos.org/mailman/listinfo/bitc-dev
>
>
_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Reply via email to