On Sun, Mar 29, 2009 at 11:48 PM, Marvin Humphrey
<[email protected]> wrote:
> On Sun, Mar 29, 2009 at 08:36:57AM -0400, Michael McCandless wrote:
>
>> Are VTables also considered Obj's?
>
> Yes.  They are structs, with the last member being a "flexible array member"
> which can hold any number of "method_t" pointers.  Here's the list of members
> (omitting host_obj and vtable, which are inherited).
>
>    VTable            *parent;
>    CharBuf           *name;         /* class name */
>    u32_t              flags;
>    size_t             obj_alloc_size;
>    size_t             vt_alloc_size;
>    Callback         **callbacks;
>    boil_method_t[1]   methods;      /* flexible array */
>
> I've included the generated code for the VARRAY VTable below my sig.  It's a
> little messy because some mild hacks are needed to satisfy C89 and portability
> constraints.

What is the vtable for vtable vtable?  Itself?

>> Do VTables only store methods?  Or can they store fields as well?
>
> You mean could they store class data?  I suppose they could.  Initializing
> might get a little messy, and I'd want to make sure that we locked them down
> and made them stateless before threading starts.

I actually meant data fields on the object, but class data ("static"
in Java) is also good.

>> Can an arbitrary Obj at runtime become a VTable for another Obj?
>> (True "prototype" programming language).  Seems like "no", because an
>> arbitrary Obj is not allowed to add new members wrt its parent (only
>> new VTables can do so).
>
> Correct.
>
>> Does VARRAY (VTable for VArray objects) hold a reference to OBJ?
>
> Yes: self->vtable->parent.  The Obj_Is_A() method, similar to Java's
> "instanceof" follows the chain upwards:
>
>    bool_t
>    Obj_is_a(Obj *self, VTable *target_vtable)
>    {
>        VTable *vtable = self ? self->vtable : NULL;
>        while (vtable != NULL) {
>            if (vtable == target_vtable) { return true; }
>            vtable = vtable->parent;
>        }
>        return false;
>    }

OK.

>> How are these trees of VTables init'd?
>
> All core classes have VTables which are global structs initialized at compile
> time.  Boilerplater spits them out into a file called "boil.c".
>
> User subclass VTables are allocated on the fly at runtime.

OK.

>> And Lucy objs are single inheritance.
>
> Correct.
>
>> So a VArray is allowed to have C NULLs in its elems array (vs say Java
>> which always inits the array to hold Java null's).
>
> That's how it's implemented now.  Changing it over to something like Java
> null's might be a good idea.

Saves having to NULL check, though it's wasteful if the code that
created the VArray will immediately then fill it in.

>> Is there an explicit object in Lucy that represents null (java), None
>> (Python), etc.?
>
> Yes: UNDEF, a singleton belonging to the class Undefined.
> However, it's not used very much.

OK.

>> > First, note that the destructor for VArray invokes the destructor of its
>> > parent class, Obj.  This superclass call makes it possible for us to add
>> > members to a base class without having to manually edit all subclasses.
>>
>> Great.
>
> The same is true at construction time; each class implements two functions,
> new() and init(), and subclasses call their parent class's init():
>
>    TermQuery*
>    TermQuery_new(const CharBuf *field, const Obj *term)
>    {
>        TermQuery *self = (TermQuery*)CREATE(NULL, TERMQUERY);
>        return TermQuery_init(self, field, term);
>    }
>
>    TermQuery*
>    TermQuery_init(TermQuery *self, const CharBuf *field, const Obj *term)
>    {
>        Query_init((Query*)self, 1.0f);
>        self->field       = CB_Clone(field);
>        self->term        = Obj_Clone(term);
>        return self;
>    }

OK.

>> I'm a bit confused: what if you have a Lucy obj, that's got a cached
>> host obj, such that the host obj is not referred to anywhere in the
>> host language, but is referred to in Lucy, and Lucy finally decrefs
>> its last reference.
>
> At that point, the Lucy object and the Python object share a single refcount,
> which resides in the Python object's ob_refcnt member.  When you call PyDECREF
> on the Python object and ob_refcnt falls to 0, Python then invokes the
> "__del__" method.  We will define "__del__" so that it invokes Destroy().

AHH, sorry, I see now.

Hmm...  Python's cyclic collector won't collect cycles involving
classes that have __del__ since it can't guess a safe order to run the
__del__ methods.  (I know we're expecting people to just avoid making
cycles, but still important to know).

>> How is the cycle broken in that case?
>
> There's no reference cycle because the Lucy object and the Python object share
> a single unified refcount.  The have C pointers which point at each other, but
> they don't hold refcounts open against each other.
>
>> (Ie, Destroy should be invoked via Lucy).
>
> The call sequence will be:
>
>    Obj_Dec_Refcount(lucy_obj);
>    PyDECREF(lucy_obj->host_obj);
>    python_obj.__del__()
>    Obj_Destroy(lucy_obj);

OK.

>> OK.  That 1 refCount "belonging" to Lucy.  This is essentially an
>> efficient way to represent the common case of "only Lucy has a single
>> reference to this object".
>
> Yes.  In addition, it allows us to initialize structs at compile-time.
>
>   CharBuf snapshot_new = {
>       (VTable*)&CHARBUF, /* vtable */
>       NULL,              /* <------ NULL host_obj */
>       "snapshot.new",    /* ptr */
>       12,                /* size */
>       13                 /* capacity */
>   }
>
> Since Python objects live on the heap, we can't create them at compile-time.
> The NULL conveniently stands in for them.

Great.

>> >  void*
>> >  Obj_to_host(Obj *self)
>> >  {
>> >      if (self->host_obj) {
>> >          /* The Python object is already cached, so incref it and return. 
>> > */
>> >          PyINCREF((PyObject*)self->host_obj);
>> >          return self->host_obj;
>> >      }
>> >      else {
>> >          /* Lazily create Python object. */
>> >          self->host_obj = PyCObject_FromVoidPtr((void*)self, Obj_Destroy)
>> >      }
>> >  }
>>
>> (missing a return on the else clause, but I get it).
>
> Heh.  Why won't my email client test my code for me? :)

Someday :)

>> So this is a great approach, in that a host obj
>> is not created immediately on creating a Lucy obj.  However, it's
>> still "falsely" created, in order to track refCount > 1 from within
>> Lucy, even when the obj never crosses the bridge.
>
> Yes.  But if we are parsimonious about creating objects in the first place, it
> doesn't matter so much if constant costs per object are large.

Yes.

>> If only host languages let us override what decRef does for a given
>> obj... then we could break the tight cycles ourself and only allocate
>> a host obj when needed.
>
> There's an alternative.  We can waste an extra 4 bytes per object to hold an
> integer refcount, and use that unless a host object has been cached.  The
> instant that the host object has been cached, we set its refcount and use that
> instead.
>
> I'm not sure that's either clearer or faster, but it does mean that we
> wouldn't ever needlessly create host objects.

That's interesting... you could probably simply use that host_obj
field to hold low value ref counts (which are not valid pointers).
Though that's scary-C-hack-territory :)

Mike

Reply via email to