On Sun, Mar 29, 2009 at 9:20 PM, Marvin Humphrey <[email protected]> wrote:
> On Sun, Mar 29, 2009 at 05:55:11AM -0400, Michael McCandless wrote:
>
>> > FWIW, the cached-host-object algo wins the tight loop performance contest.
>>
>> Which makes sense, but how often are such loops [expected to be] used
>> in practice?  EG we already thought Collector inside host would be
>> doable but expected poor performance.
>
> The performance hit will matter whenever someone overrides a tight-loop
> method.  How much it matters will vary from host to host.
>
> Probably any Scorer or Collector would feel it.  DataReader: it depends on
> whether there are Scorer implementations which call on it in tight loops.
>
> InStream and OutStream: definitely would have, except that they're final
> classes so you can't override any methods.

Yeah OK.

> Analyzer: maybe not so much because the analysis method would be non-trivial.
> (Assuming that we settle on something like the KS "Inversion"-style Analyzers
> rather than the method-call-orgy of Lucene TokenStream.)

What's an inversion style analyzer?

> Searcher, Indexer, Query, QueryParser, DataWriter, IndexManager, Lock: no,
> because they don't operate light methods within tight loops.  FieldSpec,
> Schema: I don't think so because so far it's been possible to cache the return
> values of certain methods which would otherwise be called often.
> (KinoSearch::Index::Inverter performs such caching.)
>
>> How much work is being done when you create a new host-wrapper object?
>
> That will vary from Host to Host.  In Perl, we're using blessed scalars, which
> are about as cheap as Perl objects get.
>
> However, Perl method dispatch is comparatively slow; if method calls were
> faster, the effect would loom larger.  In Java, for instance, method calls are
> very fast, and creating a host wrapper object on each loop will probably cause
> a significant degradation much sooner.
>
> Put another way: caching a host wrapper object probably matters a lot more
> when method dispatch is fast.

OK.

>> Could you somehow re-use such objects?  (Sort of like how when you go
>> to a children's museum, to the water section, they provide a bunch of
>> yellow frocks that the kids can wear when they play with the water,
>> and return when they are done).
>
> Evocative metaphor. :)  So, who manages the pool?

It's a massively concurrent free for all with no locking :)  And
sometimes frocks get lost for a little while and a background sweeping
thread eventually reclaims them.

> Ideally, when we know we're going to be calling back to the host in a tight
> loop, we'd want the caller to cache the host object.  But that's not easy.
>
> Here's the basic implementation of Matcher_Collect():
>
>  void
>  Matcher_collect(Matcher *self, Collector *collector)
>  {
>    i32_t doc_num;
>    Collector_Set_Matcher(collector, self);
>    while (0 != (doc_num = Matcher_Next(self))) {
>      Coll_Collect(doc_num);
>    }
>  }
>
> Say that the user has overridden Collector's Collect() method in the host, so
> it's slow.  To get best performance, we need to change how *Matcher* works,
> even though it's *Collector* that's the source of the slowdown.
>
>  void
>  Matcher_collect(Matcher *self, Collector *collector)
>  {
>    i32_t  doc_num;
>    void  *host_obj = Coll_To_Host(collector); /* <--- Cache wrapper here */
>
>    Collector_Set_Matcher(collector, self);
>    while (0 != (doc_num = Matcher_Next(self))) {
>      Host_callback(host_obj, "collect", 1, ARG_I32("doc_num", doc_num));
>    }
>
>    Coll_Destroy_Host(collector, host_obj); /* <--- Destroy wrapper here */
>  }
>
> (Host_callback() ordinarily takes a Lucy object as its first argument, but
> pretend that it takes a host wrapper object for now.)
>
> Coll_Collect doesn't know it's being called in a loop.  Even if it did, it
> could only cache host object wrappers using static or global vars. (Bye-bye
> thread safety, and what on earth do we do about subclasses?)

OK.

>  static void *Coll_collect_HOST_OBJ = NULL;
>  void
>  Coll_collect_OVERRIDE(Collector* self, i32_t doc_num)
>  {
>    /* Bleah. */
>    Coll_collect_HOST_OBJ = Coll_Validate_Or_Make_Wrapper(self,
>        Coll_collect_HOST_OBJ);
>    Host_callback(Coll_collect_HOST_OBJ, "collect", 1,
>        ARG_I32("doc_num", doc_num));
>  }
>
>> > We lose convenient use of the "inside-out-object" pattern. :(
>>
>> Neat.... I've seen a similar approach (though likely someone has given
>> / would give it a different named pattern)
>
> The "FlyWeight" design pattern is very similar.
>
>> In your example, shouldn't/couldn't the hash key somehow be the id of
>> the *Lucy* object, not of its fleeting host wrapper obj?
>
> Actually, that's what $$self is.
>
>   $self          <---- address of Perl object[1]
>   $$self         <---- address of Lucy object
>
> However, invoking $self->DESTROY on multiple wrappers around the same Lucy
> object is still a problem.

Hmmmm true.

>  sub DESTROY {
>      my $self = shift;
>      delete $foo{$$self};
>      $self->SUPER::DESTROY;  # Calls Dec_RefCount(), not Destroy() as it
>                              # would under the cached-object model.
>  }
>
> As soon as the wrapper has its refcount fall to 0, Perl invokes DESTROY, and
> despite the fact that the Lucy object doesn't get zapped yet, $foo{$$self}
> gets deleted.

OK, I'm convinced... it seems like we should stick with your approach
(make & cache host wrapper when requested or RC becomes 2).

Mike

Reply via email to