Re: [bitc-dev] ARC in LLVM

Ben Karel Thu, 17 Oct 2013 18:50:14 -0700

On Thu, Oct 17, 2013 at 4:45 PM, Jonathan S. Shapiro <[email protected]>wrote:

> On Thu, Oct 17, 2013 at 11:31 AM, Ben Karel <[email protected]> wrote:
>
>> On Thu, Oct 17, 2013 at 2:02 PM, Jonathan S. Shapiro <[email protected]>wrote:
>>
>>> Thanks, Ben.
>>>
>>> From the outside, looking from the GC-inclined perspective, it seems
>>> unfortunate that LLVM doesn't have a distinguished register class in the IR
>>> for object references as opposed to pointers. This would go a long way
>>> toward solving the stack map problem. On the other hand, I haven't thought
>>> hard enough about whether the presence of GCREAD/GCWRITE coupled with local
>>> dataflow might be enough.
>>>
>>
>> What stack map problem?
>>
>
> The stack map problem is to keep track of which words on the stack hold
> pointers at any given time. It's a map taking as input a PC and an SP and
> producing as output a set of frame-relative word offsets for the words that
> contain object references.
>
> Last time I checked, LLVM provided *zero* assist for this.
>

According to my 40 second Internet Archive query, LLVM has had documented
stack map support since 2007. You knew this once upon a time ;) *
https://groups.google.com/d/msg/llvm-dev/M4HOyteR4J4/VNOQX0Q7YysJ*

>  The solution people were using was to adopt the threaded-stack approach
> originally proposed by Fergus Henderson [*]. In essence, the front end
> gathers all of the object references into an on-stack shadow stack, and
> then arranges to "leak" the address of the struct outside the procedure.
> The effect is that any temporary register containing an object reference
> becomes non-live at every procedure call and has to be re-loaded from the
> struct. This ensures that there are no live object references in registers
> at any time when a collection might occur.
>
>    [*] Fergus Henderson, Accurate Garbage Collection in an Uncooperative
> Environment
>
> The overhead is just as high as the Boehm collector, which is well known
> to have much higher overhead than GCs designed for safe languages (as it
> should - Boehm is solving a *much* harder problem). The fact that the
> Henderson overhead is this high is primarily due to the inability to
> preserve object references in registers across procedure calls. The
> importance of that is *astonishing*.
>
> In a non-relocating collector, or in a collector that does not relocate
> root-referenced objects, it *would* be possible to preserve object
> referencing registers. In that model, the Henderson mechanism can be viewed
> as a clever trick for identifying roots that can be used in a compiler that
> is otherwise uncooperative. In effect, the front end is building the stack
> map explicitly.
>
>
>>
>>
>>> The "no objects in registers" rule, though, is *very* expensive. At
>>> first glance, it would seem to restrict the compiler to the Henderson-style
>>> approach. The overhead of those spills has been measured. It's pretty big.
>>>
>>
>> I'm unclear on what you think "no objects in registers" (again, that
>> word!) means, or what alternative you have in mind. Could you clarify? And
>> also cite the study you have in mind?
>>
>
> If you don't know which registers contain object references, you have two
> choices:
>
>    1. Ensure that any object which *might* be referenced by a register
>    (conservatively) is not relocated. This precludes mark-compact and
>    generally makes relocation a pain.
>    2. Ensure that no object references are allowed to remain in registers
>    during a collection (this is what Henderson is doing).
>
> So basically, those pesky registers really *are* out to get you. :-)
>

My objection isn't to the idea of registers -- I'm quite fond of $ebp
myself -- but the semantic confusion that comes from the different notion
of the word "register" in the context of a concrete ISA versus LLVM's IR.

Maybe a vague analogy will help. Signed overflow in C being undefined
behavior was a relatively obscure fact until relatively recently. Most
programmers figured, hey, I compiled and ran a program with unsigned
overflow just the other day, and it was clearly twos-complement behavior,
so that's what I expect to get tomorrow. This conflates the behavior of the
concrete x86 assembly, and the abstract C machine corresponding to the
source program. Model vs implementation.

Likewise, here there is (or, rather, should be!) a distinction between x86
registers and SSA bindings, which implies a difference in the treatment of
x86 registers at safe points and the treatment of LLVM's SSA bindings at
safe points. There are two angles to this. The key point is that SSA
bindings are immutable. Not merely immutable given the set of observations
afforded by the source language being compiled to LLVM, but immutable to
any observation possible in LLVM IR itself (such as, say, printf("%p",
...)).

If an LLVM binding, pointing to an object in the GC heap, was modified as a
result of garbage collection, this fundamental property of the IR would be
violated. So there must be two LLVM bindings: one for the value of the heap
address before the potential-GC occurs, and one for the value afterwards.
Often they will be the same address, when GC isn't actually invoked.
Sometimes they will be different, when a GC does occur.

Another way of seeing the SSA-vs-x86 register distinction is this: x86
registers are, fundamentally, just names for memory locations. LLVM
registers are NOT memory locations -- they are a disjoint space of values!
Think Harvard architecture style separation, if that helps.

An example, to be a bit more concrete:
  %root = alloca i32*
  ...
  store i32* %x, i32** %root
  %0 = call i32 @might_result_in_copying_gc()
  %y = load i32** %root

The common interpretation of this code is "spill the pointer x into the
stack, make a function call, then reload the register from the stack", and
that's what LLVM's current implementation will do. But the pure-SSA
viewpoint is more interesting: it says this means "before the call, I'm
referring to the value of the %root memory location as %x, and afterwards,
I'm referring to it as %y."  Now it's clearer that the %root value could
just as easily be stored in an x86 register as a stack slot (assuming the
%root slot address doesn't escape). Model vs implementation.

Destroying the immutable-value aspect of the SSA IR for the sake of not
having to explicitly write that store/load pair is throwing out the baby
and keeping the bathwater.

So that's why I suggested that saying "register" alone, without being
clearer about ISA register vs SSA binding, can muddy the waters and lead to
confusion.

> Though now that I think about it, the only objects we *really* need to
> avoid moving are the ones that are actually referenced from registers, and
> a tri-color technique can find those pretty easily. The first problem is
> that it leaves fragmentation in new space, where you *really* want to be
> able to play nice with the bump allocator. The second problem is that
> registers may hold interior pointers, and even posterior pointers, that are
> object references in disguise. This second problem is why the Henderson
> technique ensures that all object referencing registers are non-live. That
> also ensures that dependent interior temporaries die at procedure calls.
>
>
>>
>>> There is a stage in LLVM where hard registers are selected. Do the
>>> GCREAD/GCWRITE annotations survive to that point?
>>>
>>
>> No, the intrinsics are lowered to LLVM source-level IR.
>>
>
> In that case you have absolutely no way to know which hard registers hold
> object references, and no way to tell the optimizer that (e.g.) using an
> interior pointer to index over an array is a bad idea. That seems
> unfortunate.
>
> I agree in general about pointer-kind distinctions; the question is,
>> should those distinctions be made **in LLVM IR** or in a pre-LLVM frontend
>> IR? The natural inclination is to do it at the LLVM level,  but I'd argue
>> the latter makes for a cleaner overall design.
>>
>
> They cannot be done correctly in a pre-LLVM front-end IR. If the mid-end
> IR doesn't preserve the register classifications, then it is impossible for
> the back end to know which hard registers hold object references, and *
> that* is what we wanted to know here. It's also impossible for the
> mid-end optimizer to know when optimizations involving interior pointers
> should be avoided.
>
> So I think it's fundamental. What's needed from an IR perspective is:
>
>
>    1. LDREF / STREF instructions
>    2. A new register class for object references. This is conceptually
>    similar to the existing register classes that distinguish floating point
>    and integer values.
>
> Then you go through and make sure that all of the inner pointer
> optimizations are properly sensitized to register class. The easiest way to
> ensure that is to make sure that the IR doesn't include an add operation
> that takes an IR register in the object-reference register class.
>
> Then at the end you tweak the hard register allocator to keep track of
> what's what.
>

Do you have a specific example of LLVM IR which is sub-optimally compiled
due to the lack of what you have proposed? Attempting to find such an
example might be a valuable exercise for the skeptical reader. ;-)

I should probably end by acknowledging, again, that LLVM's GC support is
not perfect. There are enhancements to be made, as soon as someone steps up
to contribute patches. But I have come to believe that the shortcomings
almost entirely fall under "relatively minor implementation details" rather
than "serious and fundamental design flaws."

_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Re: [bitc-dev] ARC in LLVM

Reply via email to