Re: [lucy-dev] Fragile base class problem, revisited

Marvin Humphrey Tue, 18 Sep 2012 11:16:37 -0700

On Mon, Sep 17, 2012 at 8:49 PM, Nathan Kurz <[email protected]> wrote:
> On Mon, Sep 17, 2012 at 3:31 PM, Marvin Humphrey <[email protected]> 
> wrote:
>> On Mon, Sep 17, 2012 at 1:24 PM, Nathan Kurz <[email protected]> wrote:
>>> http://blog.omega-prime.co.uk/?p=121
>>
>> Rats.  I wish what was described in that post actually worked.  It would be
>> so nice if we could alias e.g. `lucy_Query_to_string_OFFSET` to
>> `cfish_Obj_to_string_OFFSET`.
>>
>> Even better if we could alias to a constant, so that e.g.
>> `lucy_Query_to_string_OFFSET` could be replaced by `72` at link-time.
>
> I'm hoping there still is a way, and hoping that way isn't too awful.
> I mean, it's obviously possible, just a question of how low we have to
> go.  Do you suppose it would be awkward to distribute our own
> dynamic linker? :)


Heh.  You could say that's how Java and C# solve problems like this, if you
think of JIT compilers as linkers on steroids. :)

In keeping with the metaphor of a clownfish forming a symbiotic relationship
with its host anemone, we're trying to work with the existing system for
linking.  So far we're doing pretty well.

These OFFSET variable lookups are merely a question of optimization.  We
haven't even done significant benchmarking to measure how much they cost --
the only data point we have is that when I marked all methods in InStream and
OutStream as `final` (so that those method symbols resolve as direct function
invocations rather than go through vtable dispatch) there wasn't a measurable
change on our indexing benchmark.

Weak symbols seem tantalizing but I'm not sure they can help:

    http://en.wikipedia.org/wiki/Weak_symbol

The Lucy DSO could define `cfish_Obj_to_string_OFFSET` as a weak symbol, which
would be overridden by the version supplied by the Clownfish DSO.  But even
then, each access would still be a variable lookup, and would involve an extra
level of indirection via the GOT[1] to boot.  Nick's approach of having each
DSO maintain its own OFFSET variables with `hidden` visibility (which also
allows us to reduce the _number_ of OFFSET variables) seems better.

FWIW, here's somebody else trying to do the same thing we are:

    http://stackoverflow.com/questions/9753723/c-constants-defined-at-link-time

> The 'ioctl' wouldn't be a 'perform' function, but a link to a
> substruct.  If the padding ran out and we desperately needed to add
> another variable to a parent class, we'd add a pointer to a struct in
> the last space.  Then one would reference foo->extra->new1,
> foo->extra->new2.   This would hold us until the next major release,
> when we'd clean up, fold this into the main struct, and add extra
> padding if still needed. Not too pretty, but a decent backstop.  And
> if we plan ahead as to where additions are most likely, we shouldn't
> ever have to resort to it.
>
> I love the simplicity of a simple struct: no macros, no accessors, no
> opacity.

I can't get on board with "no opacity" being a good thing. :\

Keeping struct definitions opaque is fundamental information hiding.  It's
problematic if either users or compilers "know" the offsets at which member
variables may be accessed.  If users "know" offsets, some fraction of them
are going to manipulate member variables directly, piercing encapsulation.
If compilers "know" offsets, we get fragile base class problems because of
incorrect compile-time assumptions that those offsets are constant forever.

Expecting programmers to reserve slots with dummy variables is a grotesque
workaround and a huge burden.  It might be the only way to make C++ work, but
can't we learn from the past rather than repeat it?

> If there is any way we can keep it for instance variables, I
> think we should.

The proposal on the table isn't any more complicated than a struct.  The way
the fields are laid out in memory is identical.

The only difference is that unlike Java and C#, C only provides native syntax
for accessing a member variable when its offset is known at compile-time.
Some parts of C are antiquated.  This is one.

For the user, accessing a member vars struct via an inline function like
`Foo_Get_IVARS()` is no more onerous than accessing a nested struct:

    void
    Foo_get_num(Foo *self) {
        return Foo_GET_IVARS(self)->num;
    }

    void
    Foo_get_num(Foo *self) {
        return self->foo_extra->num;
    }

Furthermore, the quasi-accessor does the thing by default, while direct
access, which is meant to be simple, ironically requires that users possess a
sophisticated understanding of the "fragile base class" problem in order to
program safely.

Marvin Humphrey

[1] GOT: Global Offset Table
    http://en.wikipedia.org/wiki/Position-independent_code#Technical_details

Re: [lucy-dev] Fragile base class problem, revisited

Reply via email to