It's sad when you write a response and then feel that a Table of Contents might help to get your readers all the way to the bottom. But probably worse not to have one if it's presence would help overcome the reluctance.
1. Friendly Opening a. Bad Joke b. General Agreement c. Tread Softly 2. Nick's Proposal a. Cognitive Load Defined b. Revised Example c. Hierarchy Dependent d. Error Reporting Suboptimal e. Still Fragile f. Performance Acceptable 3. Alternate Proposal a. Sketchy Code b. Example Syntax c. Biased Assessment f. Restated Requirements e. Hope for Future --- 1. Friendly Opening On Sat, Sep 22, 2012 at 3:29 PM, Marvin Humphrey <[email protected]> wrote: > We're at an impasse, then. I'm fundamentally opposed to publishing a > Clownfish C API where all instance variables are exposed as struct members. That's an easy one. I'll just try to curl up into a very small ball so it's not hard for the rest of you to easily step over my dead body. :) I actually don't disagree that this is a great thing for Clownfish --- it's the necessity for Lucy that I'm questioning. Were Clownfish an independent project, this would be a wonderful direction to go. But since they are entwined, and since you feel strongly that it's needed now, the obvious answer is that we do as you say, avoid straight structs, and come up with the best non-fragile base class system that we can. Nick: Let my start by saying your proposal is great! I think there's room for improvement, but having a proposal on the table makes it a lot easier to talk about the details. 2. Nick's proposal >>> The proposal on the table isn't any more complicated than a struct. The way >>> the fields are laid out in memory is identical. >> >> While the second sentence is true, I strongly disagree with the first. >> The cognitive load is extreme. > > I don't think that Nick's proposal is hard to understand. Member variables > are still accessed via a struct; the interface change is that you have to look > up that struct first. We're defining terms differently here. Yes, the proposal is very straightforward. The "cognitive load" is that the user needs to be aware of not only the object as it is, but the full hierarchy. Or am I misunderstanding how things would work? Let me flesh out your example with some silly worst-case pseudocode: Classes: class Base { int id; int *type; } class Parent inherits from Base { int field; int order; } class Child inherits from Parent { int number; int subtype; } class MyChild inherits from Child { int result; int subfield; } Old: int MyChild_prepare_result(MyChild *self) { if (self-->field == 1 && self->subfield == 2) { self->result = self->order; } else { self->result = self->type[self->number]; } return self->id; } NickNew: int MyChild_prepare_result(MyChild *self) { Base *base_ivars = Base_IVARS(self); Parent *parent_ivars = Parent_IVARS(self); Child *child_ivars = Child_IVARS(self); MyChild *mychild_ivars = MyChild_IVARS(self); if (parent_ivars->field = 1 && mychild_ivars->subfield == 2) { mychild_ivars->result = child_ivars->order; } else { mychild_ivars->result = base_ivars->type[parent_ivars->number]; } return base_ivars->id; } I think this matches what is proposed? If so, even the scrolling back and forth to get this example right was painful. If it involved flipping back and forth between 4 different files (3 of which are unfamiliar) it would be excruciating. Along that lines, I think I left at least one error in "New" -- how long does it take to spot it? The "Old" version is easy because one only needs to know the properties of the object, and in a pinch, you can just look at the generated C. Or even easier, your editor or IDE can look at it and provide you with autocompletion for all available struct members. I use emacs, so I'm sure it's possible to make that work somehow, but I don't expect it to be easy. My second "complaint" is that despite this, we still end up with base classes that are quite fragile. The benefit (and it is significant) is that we gain the ability to add member variables to the end of the structs for each parent class. But programmer discipline is needed, for if they are added elsewhere compiled modules will break even though the local recompile works just fine. Moving a variable from Grandparent to Parent causes the same problems. Deletion is always going to be tricky, but it would be nice to get an error message rather than a segfault. If we are going down this route, I think we should aim for truly robust rather than just less fragile. On the bright side, I don't think that performance is going to be that big of an issue. Maybe 50 cycles per function for the accessing a variable once the lookups are cached in L3? Another dozen for the pipeline to finish adding the offsets? Repeated access should be much faster, and we can easily cache a local pointer for tight loops. Taking a blind stab, maybe 25% initially which we can reduce to 10% by hitting a few hotspots? Making it fast again while gaining the flexibility seems like a fun challenge. 3. Alternative Proposal If we want greater clarity and robustness, and assuming I'm not missing something obvious about Nick's proposal, I think we'll need to work at the level of the individual variables rather than having a per-class offset. If we're willing to tolerate having lots of new symbols (and I don't see that we shouldn't) I think this can be quite straightforward, both for syntax and implementation. We can sort out the performance later once we say what sort of hit it actually is. I'd propose something like this: NateNew: // generated by cfish but no typedef struct __Child { int id; // Base int *type; // Base int field; // Parent int order; // Parent int number; // Child int subtype; // Child } // public but opaque typedef to boobytrap direct usage typedef struct { void[sizeof(struct __Child)] *direct_access_not_allowed; } Child_t; // symbols in shared object set to values at library compile time int __Child_id = offsetof(struct __Child, id); int * __Child_type = offsetof(struct __Child, type); int __Child_field = offsetof(struct __Child, field); int __Child_order = offsetof(struct __Child, order); int __Child_number = offsetof(struct __Child, number); int __Child_subtype = offsetof(struct __Child, subtype); // approximate macro in public API, likely not working nor free of pitfalls #define Child(self, var) ( \\ (typeof( ((struct __Child *)0)->var)) \\ ((char *) self + __Child_ ## var) \\ ) // macro must have correct type, check that var exists, and work as an lvalue // generated by cfish for user class but no typedef struct __MyChild { int id; // Base int *type; // Base int field; // Parent int order; // Parent int number; // Child int subtype; // Child int result; // MyChild int subfield; // MyChild } // opaque typedef generated by cfish for allocation and typechecking typedef struct { void[sizeof(struct __MyChild)] *direct_access_not_allowed; } MyChild; int MyChild_prepare_result(MyChild *self) { if (MyChild(self, field) == 1 && MyChild(self, subfield) == 2) { MyChild(self, result) = MyChild(self, order); } else { MyChild(self, result) = self->type[self->number]; } return self->id; } I'm sure it's full of egregious holes that need to be patched, but I like that it's a search-and-replace away from struct notation, doesn't require knowledge of the object hierarchy, reports runtime errors at startup, and allows for arbitrary reordering of the base classes. Performance will initially be several notches worse than Nick's proposal, but in the same ballpark. But if we can ever figure out how to do a proper Link Time Optimization of the lookups, they both have about the same potential. There are probably other ways to accomplish this, but the key is that we need 1) a fleshed out struct to use for member type, existence, and autocompletion 2) an opaque struct used for object-level type checking and allocation size 3) a macro to access the real offsets that are loaded from the shared object 4) a way to prevent the object from being accessed by any other means 5) a readable syntax that can be used on the left or right of an assignment I think this is achievable. --nate
