Marvin Humphrey <[email protected]> wrote:
>> > There are some quirks, though, with how it manages host objects.
>> > The default behavior is to create a host language object at the
>> > same time as the Boilerplater object, and have the host object
>> > manage the refcount.
>>
>> Hmm, sounds tricky... because there are typically consumers in C
>> and in the Host language and both need to incRef/decRef.
>
> Indeed, we need to accomodate refcounting ops both in the Lucy core
> and in the Host. For a refcounted Host like Perl or Python, all of
> these ops will affect a single, *unified* refcount which resides in
> the cached Perl/Python object at self->ref.host_obj. That's what I
> meant by "have the host manage the refcount" -- I wasn't clear
> enough that Lucy would be able to manipulate that host refcount
> using wrapper methods.
>
> The Lucy::Obj header at trunk/core/Lucy/Obj.bp will declare
> Inc_RefCount() and Dec_RefCount() methods:
>
> /** Increment an object's refcount.
> *
> * @return the object, allowing an assignment idiom.
> */
> public incremented Obj*
> Inc_RefCount(Obj *self);
>
> /** Decrement an object's refcount, calling Destroy() if it hits 0.
> *
> * @return the modified refcount.
> */
> u32_t
> Dec_RefCount(Obj *self);
>
> However, no implementation for these methods is provided in
> trunk/core/Lucy/Obj.c. It will be up to the bindings to provide an
> implentation, or a linking error will occur.
>
> For the Perl bindings, we'll provide a second
> Obj.c at trunk/perl/xs/Lucy/Obj.c which will contain the following:
>
> lucy_Obj*
> lucy_Obj_inc_refcount(lucy_Obj *self)
> {
> SvREFCNT_inc_simple_void_NN((SV*)self->ref.host_obj);
> return self;
> }
>
> chy_u32_t
> lucy_Obj_dec_refcount(lucy_Obj *self)
> {
> chy_u32_t modified_refcount = SvREFCNT((SV*)self->ref.host_obj) - 1;
> /* If the SV's refcount falls to 0, DESTROY will be invoked from
> * Perl-space.
> */
> SvREFCNT_dec((SV*)self->ref.host_obj);
> return modified_refcount;
> }
>
> That's how most objects in Lucy will be managed. However, that
> approach isn't ideal for all of them.
>
> The first, obvious objection to caching a host object inside every
> single Lucy object is that it wastes memory for those objects which
> never venture into Host-space; an integer refcount would require
> less overhead. The "FastObj" class was originally written to
> address this concern.
So... one alternative would be to separately track a private-to-Lucy
refCount from the host object's refCount? Then, for Lucy objects that
never cross the bridge you wouldn't have to make a "false" host
object. But you'd need to take care to destroy an object when both
Lucy's Obj & the host's wrapper obj drop to refCount 1.
This may also be better for non-refCount languages (Java).
> However, that's not a major problem unless we're creating and
> destroying a boatload of small objects. Lucene 1.4.3 was a
> profligate wastrel in this regard, but KinoSearch's basic
> architecture has gotten pretty lean and has room to get leaner
> still. If memory use and speed were the only reasons to use
> FastObj, I think we could kill it off.
I think Lucene has improved here too, especially on the indexing side
(though the searching side doesn't create too many tiny objects I
think).
> However, there's a second, more annoying problem. It's not possible
> to declare static structs which contain e.g. a Perl object, because
> all Perl objects are malloc'd at runtime. That's inconvenient for
> declaring things like CharBuf literals or VTables:
>
> /* Can't do this unless CharBuf is a subclass of FastObj. */
> static CharBuf foo = {
> (VTable*)&CHARBUF,
> 1, /* ref.count */
> "foo", /* character data */
> 3, /* size */
> 4 /* capacity (includes terminating NULL) */
> };
>
> It's probably possible to initialize all of our VTables, CharBuf
> literals, and such in a bootstrap routine, but it's enough of a pain
> to set something like that up that I haven't gone and made such a
> change in KS.
>
> I'd really like to kill of FastObj just for the sake of simplicity,
> though.
What objects are planned to subclass FastObj?
>> > I've tried searching the web for resources on how to make
>> > refcounting and GC coexist happily, but I haven't found anything
>> > so far. If anybody's got strong google-fu today or actually
>> > knows and can recommend some literature, I'm all ears.
>>
>> This is tricky!
>
> There's one scheme that I know will at least work under a tracing
> garbage collector: the one used by Ferret.
>
> * Within the C portion of Lucy, perform integer refcounting.
> * Every time a unique host wrapper object is created, increment
> the refcount.
> * Every time a host wrapper is destroyed, decrement the
> refcount.
>
> In other words, for Hosts that use tracing garbage collection, all
> Lucy objects would use an integer refcount, and nobody would cache
> any host objects.
I think, similarly, Java (JNI) lets you add a global reference to a
Java object which is analagous to incref-ing. But caching a host
object would be fine? Why lose that? (I don't know Ferret/Ruby very
well...).
> However, the Ferret approach has a drawback: You create and destroy
> host wrappers every time you cross the host/C boundary. That'll
> create a performance drag in some situations.
Yeah.
> The Ferret scheme won't cause problems with light usage of the
> library, because most of Lucy's work will be done within tight loops
> in the C core.
What about a HitCollector in the host language? Can you efficiently
re-use an object from the host language? (Python has such tricks, eg
to re-use a TupleObject during iteration).
> It also doesn't stop you from attaching host data to the C object
> using the "flyweight" design pattern, a.k.a. the "inside-out object"
> pattern, because you can still key data off of the unchanging C
> object memory address.
OK.
> However, once you start writing subclasses, all that OO overhead at
> the host/C boundary is going to slow down tight loops like
> Scorer_Next.
>
> Caching a host object for the life of the the Lucy object solves
> that problem, but I'm not sure how to do that within the context of
> a tracing garbage collector.
>
> We can assume that the program initiates within Host-space; most
> Lucy objects will be able to trace back to the host. However,
> independent objects that we create as statics, globals, or C stack
> vars won't be visible to the garbage collector and will get
> reclaimed prematurely.
>
> Any ideas on how to pull off the caching trick? Is there something
> we can do if we allocate space within all of our Lucy objects for
> *both* an integer refcount and a cached host object? Do we need to
> add all new objects to a giant Hash that we tell the host about, and
> yank C stack vars out of that Hash before the C function returns?
What does Ruby's C-embedding API expose to interact with its GC? I
would imagine it'd be similar to Java's (ie "here's a new global root
object")?
I don't understand the C stack vars / hash question.
Mike