On 8/28/12 1:00 AM, Marvin Humphrey wrote:
(I've divided my reply in half for the sake of digestibility...)

On Mon, Aug 27, 2012 at 8:09 PM, Peter Karman<[email protected]>  wrote:
If you follow [0] you'll see that the "least amount of code possible" is
actually quite a lot of code. (Although it's quite possible I wrote more
than I needed to -- help very welcome.)

I looked, and I don't see a lot of ways to shrink that code.  (Should `DELETE`
have been `DESTROY`, though?)

ah yes, thank you. /me suffering from context-switching overload.



* It would be nice to have TermCompiler, TermMatcher (or whatever they end up
being called) made public so that it is easier to extend all the basic Query
types: PhraseQuery, RangeQuery, ProximityQuery. I ended up re-implementing all
the C logic in Perl which I just know is going to be much slower in tight loops
at search time.

FWIW, speed only matters for Matchers.

even when iterating over a PostingList? That's where I was expecting the biggest perf hit. But that's with no evidence at all.

Or maybe I should be doing that in Matcher instead of Compiler?



Under the MatchEngine model, Lucy::Search::TermCompiler would become
Lucy::TFIDF::TFIDFTermQuery and would no longer be linked directly with
TermQuery (TermQuery_Make_Compiler() would be eliminated, and its replacement
TermQuery_Make_Weighted_Query() would either be an abstract method or return
NULL.)  Once the connection to TermQuery is severed, there's less risk in
making that class public.

cool.



* I actually like the original names (Weight, Scorer) more than Compiler and
Matcher. I understand the rationale for the change; the original names just have
more connotative meaning for me. Oh wait. I've been here before.[2] I have
changed my mind.

Heh.  I'm pleased with "Matcher".  "Compiler" was a mistake, but "Weight" also
sucks.

Hopefully, we can just eliminate Compiler and break the impasse.

+1



* I was surprised to find that Compiler isa Query. That didn't really jive for
me with how Compiler keeps accessing parent(). I.e., a Query has-a Compiler. Why
must a Compiler is-a Query?

This actually has more to do with Searcher.  If a Compiler wasn't a Query,
we'd need either additional parameters or additional methods inside Searcher.

In Lucene, most search methods have two implentations, differentiated by
method signature overloading -- one of them takes a Query, and the other takes
a Weight.  The method which takes a Query manufactures a Weight and dispatches
to the method which takes a Weight.

(If you think Lucy is convoluted with regards to Query processing inside
Searcher, try spelunking Lucene sometime -- it's _way_ worse.)

/me takes your word for it.



* The Matcher used by a TermCompiler is not a TermMatcher. It's a
PostlistSomeThingOrOther. Huh? I'm sure that's an optimization, but it still
caught be offguard and I had to hunt awhile.

If we go with MatchEngine and eliminate pluggable posting support, I think we
can rip out those layers of indirection.

Before:

     ScorePostingMatcher, which subclasses TermMatcher and wraps a
     SegPostingList which wraps an InStream.

After:

     TFIDFTermMatcher which wraps an InStream.


cool.


--
Peter Karman  .  http://peknet.com/  .  [email protected]

Reply via email to