On Sun, Mar 29, 2009 at 11:48 PM, Marvin Humphrey
<[email protected]> wrote:
> On Sun, Mar 29, 2009 at 08:36:57AM -0400, Michael McCandless wrote:
>
>> Are VTables also considered Obj's?
>
> Yes. They are structs, with the last member being a "flexible array member"
> which can hold any number of "method_t" pointers. Here's the list of members
> (omitting host_obj and vtable, which are inherited).
>
> VTable *parent;
> CharBuf *name; /* class name */
> u32_t flags;
> size_t obj_alloc_size;
> size_t vt_alloc_size;
> Callback **callbacks;
> boil_method_t[1] methods; /* flexible array */
>
> I've included the generated code for the VARRAY VTable below my sig. It's a
> little messy because some mild hacks are needed to satisfy C89 and portability
> constraints.
What is the vtable for vtable vtable? Itself?
>> Do VTables only store methods? Or can they store fields as well?
>
> You mean could they store class data? I suppose they could. Initializing
> might get a little messy, and I'd want to make sure that we locked them down
> and made them stateless before threading starts.
I actually meant data fields on the object, but class data ("static"
in Java) is also good.
>> Can an arbitrary Obj at runtime become a VTable for another Obj?
>> (True "prototype" programming language). Seems like "no", because an
>> arbitrary Obj is not allowed to add new members wrt its parent (only
>> new VTables can do so).
>
> Correct.
>
>> Does VARRAY (VTable for VArray objects) hold a reference to OBJ?
>
> Yes: self->vtable->parent. The Obj_Is_A() method, similar to Java's
> "instanceof" follows the chain upwards:
>
> bool_t
> Obj_is_a(Obj *self, VTable *target_vtable)
> {
> VTable *vtable = self ? self->vtable : NULL;
> while (vtable != NULL) {
> if (vtable == target_vtable) { return true; }
> vtable = vtable->parent;
> }
> return false;
> }
OK.
>> How are these trees of VTables init'd?
>
> All core classes have VTables which are global structs initialized at compile
> time. Boilerplater spits them out into a file called "boil.c".
>
> User subclass VTables are allocated on the fly at runtime.
OK.
>> And Lucy objs are single inheritance.
>
> Correct.
>
>> So a VArray is allowed to have C NULLs in its elems array (vs say Java
>> which always inits the array to hold Java null's).
>
> That's how it's implemented now. Changing it over to something like Java
> null's might be a good idea.
Saves having to NULL check, though it's wasteful if the code that
created the VArray will immediately then fill it in.
>> Is there an explicit object in Lucy that represents null (java), None
>> (Python), etc.?
>
> Yes: UNDEF, a singleton belonging to the class Undefined.
> However, it's not used very much.
OK.
>> > First, note that the destructor for VArray invokes the destructor of its
>> > parent class, Obj. This superclass call makes it possible for us to add
>> > members to a base class without having to manually edit all subclasses.
>>
>> Great.
>
> The same is true at construction time; each class implements two functions,
> new() and init(), and subclasses call their parent class's init():
>
> TermQuery*
> TermQuery_new(const CharBuf *field, const Obj *term)
> {
> TermQuery *self = (TermQuery*)CREATE(NULL, TERMQUERY);
> return TermQuery_init(self, field, term);
> }
>
> TermQuery*
> TermQuery_init(TermQuery *self, const CharBuf *field, const Obj *term)
> {
> Query_init((Query*)self, 1.0f);
> self->field = CB_Clone(field);
> self->term = Obj_Clone(term);
> return self;
> }
OK.
>> I'm a bit confused: what if you have a Lucy obj, that's got a cached
>> host obj, such that the host obj is not referred to anywhere in the
>> host language, but is referred to in Lucy, and Lucy finally decrefs
>> its last reference.
>
> At that point, the Lucy object and the Python object share a single refcount,
> which resides in the Python object's ob_refcnt member. When you call PyDECREF
> on the Python object and ob_refcnt falls to 0, Python then invokes the
> "__del__" method. We will define "__del__" so that it invokes Destroy().
AHH, sorry, I see now.
Hmm... Python's cyclic collector won't collect cycles involving
classes that have __del__ since it can't guess a safe order to run the
__del__ methods. (I know we're expecting people to just avoid making
cycles, but still important to know).
>> How is the cycle broken in that case?
>
> There's no reference cycle because the Lucy object and the Python object share
> a single unified refcount. The have C pointers which point at each other, but
> they don't hold refcounts open against each other.
>
>> (Ie, Destroy should be invoked via Lucy).
>
> The call sequence will be:
>
> Obj_Dec_Refcount(lucy_obj);
> PyDECREF(lucy_obj->host_obj);
> python_obj.__del__()
> Obj_Destroy(lucy_obj);
OK.
>> OK. That 1 refCount "belonging" to Lucy. This is essentially an
>> efficient way to represent the common case of "only Lucy has a single
>> reference to this object".
>
> Yes. In addition, it allows us to initialize structs at compile-time.
>
> CharBuf snapshot_new = {
> (VTable*)&CHARBUF, /* vtable */
> NULL, /* <------ NULL host_obj */
> "snapshot.new", /* ptr */
> 12, /* size */
> 13 /* capacity */
> }
>
> Since Python objects live on the heap, we can't create them at compile-time.
> The NULL conveniently stands in for them.
Great.
>> > void*
>> > Obj_to_host(Obj *self)
>> > {
>> > if (self->host_obj) {
>> > /* The Python object is already cached, so incref it and return.
>> > */
>> > PyINCREF((PyObject*)self->host_obj);
>> > return self->host_obj;
>> > }
>> > else {
>> > /* Lazily create Python object. */
>> > self->host_obj = PyCObject_FromVoidPtr((void*)self, Obj_Destroy)
>> > }
>> > }
>>
>> (missing a return on the else clause, but I get it).
>
> Heh. Why won't my email client test my code for me? :)
Someday :)
>> So this is a great approach, in that a host obj
>> is not created immediately on creating a Lucy obj. However, it's
>> still "falsely" created, in order to track refCount > 1 from within
>> Lucy, even when the obj never crosses the bridge.
>
> Yes. But if we are parsimonious about creating objects in the first place, it
> doesn't matter so much if constant costs per object are large.
Yes.
>> If only host languages let us override what decRef does for a given
>> obj... then we could break the tight cycles ourself and only allocate
>> a host obj when needed.
>
> There's an alternative. We can waste an extra 4 bytes per object to hold an
> integer refcount, and use that unless a host object has been cached. The
> instant that the host object has been cached, we set its refcount and use that
> instead.
>
> I'm not sure that's either clearer or faster, but it does mean that we
> wouldn't ever needlessly create host objects.
That's interesting... you could probably simply use that host_obj
field to hold low value ref counts (which are not valid pointers).
Though that's scary-C-hack-territory :)
Mike