Greets,
A few months back, I attempted to tackle Clownfish's "fragile base class"
problem with regards to instance variables, using techniques inspired by
classic implementations of multiple inheritance in C++. Nick Wellnhofer
objected to the proposal's complexity, among other things:
http://s.apache.org/cTL
All things considered, I'm not sure if prepending the data of subclasses
in front of the object is worth all the trouble.
I'd like to offer Nick belated thanks for his critique, and try again. :) I
continue to believe that fixing this issue is an imperative for Clownfish. It
is not acceptable for users who install a new version of Lucy from CPAN to
have to track down compiled extensions to Lucy and rebuild them. We've
already solved the hardest ~75% of the fragile ABI issue with our vtable and
method dispatch design -- let's finish the job!
Here is an article explaining how the problem was addressed in Objective C
2.0:
http://cocoawithlove.com/2010/03/dynamic-ivars-solving-fragile-base.html
The "modern" Objective-C runtime therefore requires that accessing an ivar
follows this modified approach:
1. Add the offset for the subclass' instance value area to the object's
pointer value
2. Add the offset from the subclass' instance area to the ivar
3. Dereference (read or write from) the memory location referred to by
the offset pointer value
Here's Wikipedia's explanation of C#'s approach:
http://en.wikipedia.org/wiki/Fragile_binary_interface_problem#Languages
Another solution is to write out an intermediate file listing the offsets
and other information from the compile stage, known as meta-data. The
linker then uses this information to correct itself when the library is
loaded into an application. Platforms such as .NET do this.
Both languages are doing something similar to what Clownfish does with its
method OFFSET variables (and quite different from C++), suggesting a direction
for us to explore. Here's a proof of concept[1] in which we define a macro[2]
`Foo_num(self)` which allows access to a member variable `num` of type
`int` as both an lvalue and an rvalue:
static inline void*
SI_member_address(void *self, size_t offset) {
return (char*)self + offset;
}
typedef struct Foo Foo;
#pragma GCC visibility push(hidden)
extern size_t Foo_num_OFFSET;
#pragma GCC visibility pop
#define Foo_num(self) \
(*((int*)SI_member_address(self, Foo_num_OFFSET)))
void
Foo_set_num(Foo *self, int num) {
Foo_num(self) = num;
}
int
Foo_get_num(Foo *self) {
return Foo_num(self);
}
I can think of at least three drawbacks to this approach.
First, we must continue to compile with `-fno-strict-aliasing` and forego the
optimizations it enables.
Second, using variable offsets instead of constant offsets makes each member
var access marginally less efficient. Since we can calculate these offsets at
load time and after that they never change, it seems like we ought to be able
to exploit that information -- but maddeningly, I have not yet figured out a
way.
Third, the macro syntax is a tad verbose and awkward (moreso in lvalue
context)...
self->num = num; // now
Foo_num(self) = num; // proposed
... and it would also require extensive superficial changes to the Lucy core
codebase to switch all direct struct access over to the macro form. :\
There's an alternative: change the meaning of `->` for Clownfish objects in
our .c files (which would no longer be true C as a result). That would
require writing a parser which understands C, which is doable, but ambitious.
If we take this route, I think we could actually stop creating C struct
definitions for Clownfish classes and use the macros for all access. Not that
that's crucial, but it's funny (and instructive) to think that our opaque
structs would be opaque everywhere.
Thoughts?
Marvin Humphrey
[1] See also LUCY-234: http://s.apache.org/bqi
[2] These macros would be parcel-scoped, to avoid the trap of inheritance
breaking encapsulation.