(I've divided my reply in half for the sake of digestibility...)

On Mon, Aug 27, 2012 at 8:09 PM, Peter Karman <[email protected]> wrote:
> If you follow [0] you'll see that the "least amount of code possible" is
> actually quite a lot of code. (Although it's quite possible I wrote more
> than I needed to -- help very welcome.)

I looked, and I don't see a lot of ways to shrink that code.  (Should `DELETE`
have been `DESTROY`, though?)

> It was a good learning experience for me. However, I don't wish to impose that
> learning experience on anyone else.

Some of this is TFIDF.  Some of it is extra layers of indirection within a
compiled TermQuery... There isn't a silver bullet, but maybe we can make some
progress...

> * It would be nice to have TermCompiler, TermMatcher (or whatever they end up
> being called) made public so that it is easier to extend all the basic Query
> types: PhraseQuery, RangeQuery, ProximityQuery. I ended up re-implementing all
> the C logic in Perl which I just know is going to be much slower in tight 
> loops
> at search time.

FWIW, speed only matters for Matchers.

Under the MatchEngine model, Lucy::Search::TermCompiler would become
Lucy::TFIDF::TFIDFTermQuery and would no longer be linked directly with
TermQuery (TermQuery_Make_Compiler() would be eliminated, and its replacement
TermQuery_Make_Weighted_Query() would either be an abstract method or return
NULL.)  Once the connection to TermQuery is severed, there's less risk in
making that class public.

> * I actually like the original names (Weight, Scorer) more than Compiler and
> Matcher. I understand the rationale for the change; the original names just 
> have
> more connotative meaning for me. Oh wait. I've been here before.[2] I have
> changed my mind.

Heh.  I'm pleased with "Matcher".  "Compiler" was a mistake, but "Weight" also
sucks.

Hopefully, we can just eliminate Compiler and break the impasse.

> * I was surprised to find that Compiler isa Query. That didn't really jive for
> me with how Compiler keeps accessing parent(). I.e., a Query has-a Compiler. 
> Why
> must a Compiler is-a Query?

This actually has more to do with Searcher.  If a Compiler wasn't a Query,
we'd need either additional parameters or additional methods inside Searcher.

In Lucene, most search methods have two implentations, differentiated by
method signature overloading -- one of them takes a Query, and the other takes
a Weight.  The method which takes a Query manufactures a Weight and dispatches
to the method which takes a Weight.

(If you think Lucy is convoluted with regards to Query processing inside
Searcher, try spelunking Lucene sometime -- it's _way_ worse.)

> * The Matcher used by a TermCompiler is not a TermMatcher. It's a
> PostlistSomeThingOrOther. Huh? I'm sure that's an optimization, but it still
> caught be offguard and I had to hunt awhile.

If we go with MatchEngine and eliminate pluggable posting support, I think we
can rip out those layers of indirection.

Before:

    ScorePostingMatcher, which subclasses TermMatcher and wraps a
    SegPostingList which wraps an InStream.

After:

    TFIDFTermMatcher which wraps an InStream.

Marvin Humphrey

Reply via email to