On Nov 20, 2007, at 12:18 PM, Peter Karman wrote:

Are you finding it makes it easier to do things with XS, C and the
reference counting?

KS objects under anything other than the new, temporary class KinoSearch::Util::Nat maintain their own refcount, separate from Perl. When a Perl object wrapping a KS object has its SvREFCNT fall to 0, the DESTROY method which gets called is KinoSearch::Util::Obj::DESTROY, which simply decrements the KS object's internal refcount rather than invoking Kino_Obj_Destroy(obj).

  void
  DESTROY(self)
      kino_Obj *self;
  PPCODE:
      REFCOUNT_DEC(self);

We have to do things that way because there are many KS objects which Perl doesn't know about. For instance, when TopDocCollector's C constructor TDColl_new() is invoked, it creates its own HitQueue object without telling Perl anything about it. However, should we need to deal with that HitQueue from Perl-space, we have to wrap it in a Perl object. That's what happens here:

  {
      my $hit_queue = $collector->get_hit_queue;
  } # $hit_queue goes out of scope, DESTROY called

Currently, when that $hit_queue goes out of scope, the Perl wrapper object gets destroyed. However, the interior KS HitQueue object must not be destroyed, because $collector still needs it.

As a consequence, KS objects can reappear wrapped in several different Perl objects, which is rather strange and is probably a bug waiting to bite someone. Here's an example of how things can go wrong: cycling through multiple Perl objects doesn't work well with the inside-out pattern, because DESTROY gets invoked over and over again, necessitating a broken hack like this...

  sub DESTROY {
     my $self = shift;
     if ($self->refcount < 2) {
        delete $inside_out_var{$$self};
     }
     $self->SUPER::DESTROY;
  }

That hack doesn't even work reliably because if the last refcount gets decremented by KS internally, the Perl DESTROY method will never get called and any inside-out vars will leak.

The solution is to cache a Perl object within a KS object, so that effectively Perl *does* know about it. That's the difference between Nat and Obj. Under Nat, the refcounting is handled via the cached Perl object. There are no longer two refcounts.

One drawback of this design, though, is that Perl objects are heavyweight. That's ok for big stuff like a PostingList, but it's not-so-great for small stuff like a ByteBuf, a Token, or a TermInfo. If we were to put a Perl object into every last one of those, I'd be concerned both about memory usage and performance.

My current plan is to override the refcounting infrastructure for small classes by basing them off of a "FastObj" class which will use an integer refcount as Obj does now. The scheme is more complicated to implement than I'd like, and it will have the one-KS-object-many- Perl-objects problem for anything that subclasses FastObj. But it will work in the near term and maybe it won't be so bad.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


Reply via email to