Re: [bitc-dev] Request for advice on choice of a GSoC project

Ben Karel Fri, 08 Apr 2011 14:55:14 -0700

>
> On Fri, Apr 8, 2011 at 4:31 AM, Jonathan S. Shapiro <[email protected]>wrote:


On Thu, Apr 7, 2011 at 5:39 PM, Ben Karel <[email protected]> wrote:
>
>> As the quote there says, using a shadow stack is the easiest way to get
>> started with GC in LLVM.
>
>
> Yes. And since my comments lend themselves to being misread, let me say
> this more clearly:
>
> 1. I think it's *great* that the LLVM sample infrastructure provides a
> low-overhead point of entry for people who want to play with GC.
> 2. The presence of any particular bit of sample code certainly does not
> indicate any flaw in LLVM.
>
>
>> , but it's not really "the default" in any sense except being moderately
>> easier than designing and writing a plugin to compile stack maps, and
>> runtime code to walk the stack...
>
>
> This is the place where the problem arises in my mind. I should start by
> acknowledging that the stack map plugin functionality in LLVM is new, and
> that I haven't yet tried to explore it in detail.
>
> Here are a couple of concerns that I have:
>
> 1. Last time I looked, LLVM lacked the capability to produce register type
> maps at call sites.
> 2. The LLVM optimization infrastructure does not guarantee type
> preservation - even low level type preservation - in the optimization
> process.
> 3. Last I looked, the LLVM infrastructure did not adequately describe
> internal pointers in the low-level types.
>

I'm not sure what constitutes "adequate description", but the gcread/gcwrite
primitives take both object and slot pointers. It's not clear to me what the
problem is with leaving clients to track the connection. If the semantics of
that pair is violated, well, that's not a design flaw, that's a flat out
miscompilation.

Of these two, the second and third are the larger concerns. Issue [2] is a
> polite way of saying that people who write optimization passes targeting C,
> C++, and Fortran don't tend to know, understand, or give a hoot about the
> constraints that need to be preserved in a GC-supporting optimization pass.
> Perhaps LLVM has done better, but the absence of any discussion of
> requirements along these lines that passes are expected to obey does make me
> wonder.
>

Could you give a concrete example of an LLVM optimization pass that destroys
type information in a way that undermines GC? I'd like to experiment with my
LLVM-based GC and see if it breaks!

It's also worth noting that there are neither required nor default
optimization passes, at either the LLVM or MC levels. It's all opt-in...


>  Issue [3] is actually an example of many. More generally, there are a
> range of issues identified in David Chase's 1997 thesis *Garbage
> Collection and Other Optimizations*, and a subsequent refining paper
> jointly with Boehm, that the LLVM infrastructure simply doesn't address.
> Similarly, the kind of apprach suggested in his work with Detlefs *Garbage
> Collection and Local Variable Type-Precision and Liveness,* and it's later
> refinements, don't seem possible in the LLVM framework (I would love to be
> mistaken).
>

Thanks for the pointers! I'll have to check those papers out.


>
>
>> And, regardless of whether you use compiled stack maps or the shadow
>> stack, the actual collector -- the trickiest bit by far -- is still up to
>> you.
>>
>
> Absolutely. And in the absence of a compiler that preserves the necessary
> information across phases, there is absolutely no point trying to deploy any
> of the good - or even moderately good - GC strategies.
>

Could you summarize what information LLVM currently loses in practice, in
what phases?

 High-performance garbage collectors are strongly tied to a language
>> implementation's object model and semantics.
>>
>
> This is not as true as people want to believe. It is true in the sense that
> there are significant design concerns in the layout and content of object
> headers in some GC systems. And there are certainly *ad hoc* design
> decisions one can make if one knows that, say, cons cells are unusually
> prominent in the data set. But the reality today is that there are no
> single-language address spaces, and consequently, an effective collector
> simply *can't* depend heavily on the high-level object model.
>

That's one reason why it's useful to have types in your front-end -- the
address space as a whole may be heterogeneous, but types can show that
subsets of the address space are entirely under the control of the language
runtime. That guarantee is modulo intentional or unintentional twiddling
from external sources, but if you don't have that guarantee, you're screwed
on correctness, not just performance.

Custom GC effectively means control over custom allocators. Isn't one of the
main points of BitC, and Cyclone, that fine-grained control over memory is
important for performance?


> We don't need collectors that are good at Java - or rather, we do, but not
> in a native-targeting compiler. What we need is the information from the
> compiler to let us collect successfully in mixed-langauge heaps.
>
> And yes, I know the compiler is only part of the problem.
>
>
> shap
>
> _______________________________________________
> bitc-dev mailing list
> [email protected]
> http://www.coyotos.org/mailman/listinfo/bitc-dev
>
>






On Fri, Apr 8, 2011 at 4:11 AM, Jonathan S. Shapiro <[email protected]>
 wrote:

> On Thu, Apr 7, 2011 at 7:59 AM, Ben Karel <[email protected]> wrote:
>
>> On Mon, Apr 4, 2011 at 8:42 AM, Ben Kloosterman <[email protected]>
>>  wrote:
>>
>>> I will second  that the whole shadow stack is a bad approach from the
>>> start , it needs a redesign. Eg apps can emit GC code snippets  , while not
>>> too difficult to do it’s a architectural change which requires  more from
>>> application and hence some politics..
>>>
>>
>> Wait, who said anything about a shadow stack?
>>
>
> Last time I looked, the basic GC model in LLVM was to use a shadow stack,
> and LLVM lacked the necessary infrastructure to build, e.g., call frame
> register maps.
>
>
>> ...it's not a question of [the LLVM team] sticking their heads in the sand
>> about GC, it's simply that the core clients (and sponsors) of LLVM are for
>> non-GCed languages. So they are rationally allocating their resources
>> towards improvements that benefit all LLVM clients, and assuming that those
>> clients who are not satisfied with the current infrastructure will improve
>> it as they find necessary.
>>
>
> Compiler infrastructures take a long time to build, and are long term
> investments. They involve core technical decisions that are *very* difficult
> to change (e.g. the IR design). For this reason, successful compiler
> infrastructures simply cannot afford to succumb to the kind of short-term
> thinking you espouse. In this case, the core technical decisions in question
> go all the way back to Chris's Masters work, so I have to say that I don't
> buy the sponsorship argument.
>

I think you're not giving the LLVM team enough credit for their capacity to
evolve their infrastructure if they see the need. And this is an opportune
time for "breaking" changes to LLVM's infrastructure, since the 2-to-3
transition is in the queue. So:

Could you explicate what you perceive to be the fundamental design flaws,
and how the LLVM IR needs to be changed?

In any case, if the LLVM team believes that a sufficiently motivated party
> can do an incremental evolution from where they are to a GC-supporting IR, I
> think they are being unreastic. It isn't a simple set of changes. The level
> of disruption to currently ongoing work in progress is simply too high, the
> negative performance impact is likely to be significant, short-term
> sustained competitiveness mitigate against both. Chris is not the least bit
> naive about these concerns.
>
> I find the LLVM infrastructure to be a disappointment from this
> perspective. Given how recently it was designed, and the clearly desperate
> need to deal with mixing managed and non-managed languages, the decision to
> omit core support for managed languages from the LLVM design strikes me as a
> bad design decision. Micrsoft has definitely gotten this right; I think
> Apple has not. The decision to *implement *LLVM in a non-managed language
> also strikes me as ill chosen, though that decision more sense to me
> situationally.
>

Let he who is without sin cast the first stone, eh? :-)

_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Re: [bitc-dev] Request for advice on choice of a GSoC project

Reply via email to