On Fri, Mar 27, 2009 at 08:21:54AM -0400, Michael McCandless wrote:

> On Fri, Mar 27, 2009 at 1:08 AM, Marvin Humphrey <[email protected]>
> wrote:
> 
> > I think I have an approach that's going to allow us to eliminate FastObj:
> > We lazily create the host object, and treat a NULL host_obj as
> > semantically equivalent to a refcount of 1.

I'm happy to report that this approach succeeded.  FastObj is now history.  :)

> Much of this is beyond me, but... 

Hopefully we will soon get to the point where that's no longer the case.  

Our KS prototype now has only one object model.  I'll describe how it works
for refcounting hosts like Perl and Python.

---

Every Obj is a struct with a VTable and a host object as its first two members:

  struct Obj {
      VTable   *vtable;
      void     *host_obj;
  };

  struct VArray {
      VTable   *vtable;
      void     *host_obj;
      Obj     **elems;
      u32_t     size;
      u32_t     capacity;
  }

When any Lucy object is created, self->host_obj is NULL.  

Here's are some simplified sample constructors, for Lucy::Obj, and our
variable-sized array class Lucy::Util::VArray:

  Obj*
  Obj_new() {
      Obj *self      = (Obj*)malloc(sizeof(Obj));
      self->vtable   = (VTable*)VTable_Inc_RefCount(&OBJ);
      self->host_obj = NULL;
      return self;
  }

  VArray*
  VA_new(u32_t capacity) 
  {
      VArray *self   = (Obj*)malloc(sizeof(VArray));
      self->vtable   = (VTable*)VTable_Inc_RefCount(&VARRAY);
      self->host_obj = NULL;
      self->elems    = (Obj**)calloc(capacity * sizeof(Obj*));
      self->size     = 0;
      self->capacity = capacity;
      return self;
  }

Note that the VTable for the Obj class is OBJ, and the VTable for VArray is
VARRAY.  The same pattern holds true for other classes: TermScorer's VTable is
TERMSCORER, etc.

Here are corresponding destructors for Obj and VArray:  

  void
  Obj_destroy(Obj *self)
  {
      VTable_Dec_RefCount(self->vtable);
      free(self);
  }

  void
  VA_destroy(VArray *self)
  {
      u32_t i;
      for (i = 0; i < self->size, i++) {
          if (self->elems[i]) {
              Obj_Dec_RefCount(self->elems[i]);
          }
      }
      free(self->elems);
      Obj_destroy((Obj*)self); /* super */
  }

Two items of note about the destructors:

First, note that the destructor for VArray invokes the destructor of its
parent class, Obj.  This superclass call makes it possible for us to add
members to a base class without having to manually edit all subclasses.

Second, there is no mention whatsoever of self->host_obj in the destructor.
That's because there are only two paths into the destructor, and both of them
avoid the need for Lucy core code to worry about the cached host object.
    
  1) The cached host object was never created so it doesn't need to be 
     cleaned up.
  2) Destroy() is being invoked from host-space via e.g. Pythons "__del__"
     method, and after it returns the host will clean up the host object
     itself.

Obj declares four methods which each host must implement: 

   Get_RefCount
   Inc_RefCount
   Dec_RefCount
   To_Host

Mike, since you're familiar with Python, I'll have a go at implementing those
methods for the Python bindings.

First, the accessor for the object's refcount, which is shared by the Lucy
object and the Python object.  If self->host_obj is NULL, then the refcount is
1.  Otherwise, we delegate responsibility for tracking the refcount to the
Python object cached in self->host_obj.

  u32_t
  Obj_get_refcount(Obj *self) 
  {
      if (self->host_obj == NULL) {
          return 1;  /* NULL host_obj implies a refcount of 1. */
      }
      else {
          PyObject *py_object = (PyObject*)self->host_obj;
          return py_object->ob_refcnt;
      }
  }

Next, the method which increments the refcount.  Calling this method even once
guarantees that a Python object will be created, since the first time it is
called, the refcount will progress from 1 to 2, and we need a place to put
that number.

This means that there are two ways to indicate a refcount of 1.  Either we
have a newly created Lucy object with a NULL self->host_obj which *implies* a
refcount of 1, or we have a cached host object which had a refcount of 2 or
more at some point, but which has fallen back down to an *explicit* refcount
of 1.

  Obj*
  Obj_inc_refcount(Obj *self)
  {
      if (self->host_obj == NULL) {
        self->host_obj = Obj_To_Host(self);
      }
      PyINCREF((PyObject*)self->host_obj);
      return self;
  }

Once the host object is cached, it never goes away -- it's there for the life
of the Lucy object.

Next, the method to decrement the refcount.  Note that we only call Destroy()
directly if self->host_obj is NULL.  If we've created a Python object, then we
count on it to invoke the __del__ method when its refcount falls to 0; we will
have defined __del__ to invoke Destroy().

  u32_t
  Obj_dec_refcount(Obj *self)
  {
      if (self->host_obj == NULL) {
          /* NULL host object implies a refcount of 1.  That's dropping to 0
           * as a result of this call, so it's time to invoke Destroy(). */
          Obj_Destroy(self);
      }
      else {
          /* If the PyObject's ob_refcnt falls to 0, then the destructor will
           * be invoked from Python-space via the "__del__" method */
          PyDECREF((PyObject*)self->host_obj);
      }
  }

The last method we need to define is To_Host(), which, in the parlance of the
Python C API docs, will return a "new reference".  

(I'm not sure that this implementation is correct, but it should convey
the gist.)

  void*
  Obj_to_host(Obj *self)
  {
      if (self->host_obj) {
          /* The Python object is already cached, so incref it and return. */
          PyINCREF((PyObject*)self->host_obj);
          return self->host_obj;
      }
      else {
          /* Lazily create Python object. */
          self->host_obj = PyCObject_FromVoidPtr((void*)self, Obj_Destroy)
      }
  }

> won't there be multiple references in C to a given Lucy object, each of
> which would need to incRef the RC?

Yes.  As soon as the refcount has to be increased above 1, we lazily create a
Host object to hold the refcount.  

---

Leaving aside the question of tracing GC hosts for now... does the
cached-host-object delegated refcounting model seem sufficiently clear to you
for use within Lucy Python bindings?

The rest of the Lucy library doesn't need to know about the host object
caching -- it just uses the opaque refcounting API, which looks like plain
old integer refcounting from the outside.

Marvin Humphrey

Reply via email to