Re: Span queries, API and difficulties

Grant Ingersoll Sat, 22 Sep 2007 05:29:02 -0700

Hi Cedric,

Thanks for the detailed response. My suggestion would be to write upa set of patches that demonstrate what you want for the SpanQuerystuff, and the BooleanQuery stuff, preferably as separate patches.The SpanQuery stuff makes the most sense to me and since I am slowly,but surely, working on it, I could try to incorporate it.

As for the HitCollector, I am not exactly sure what you are trying toget at there. What Object is going to be passed in? Is it the Matchobject? What would it mean for other implementations that aren'tusing a Match object? How would it be incorporated into Lucene for ageneral case? Again, a patch here may make it obvious.


-Grant

On Sep 22, 2007, at 5:45 AM, melix wrote:

Hi all,
Sorry for the late response, I've been quite busy (working on myLucenetweak, and still not finished ;)). Basically, I need to be able tofind outwhat matched on a document basis on a complex query. For example,in a ORclause, I need to know which of the sub(s) clause(s) have matched,and,going deeper in the query tree, for each subclause itself, find outwhat
matched. This is made to be able to score documents with semantics
reasoning.
As I want to limit breaking Lucene compatibility, I've decided totry, asmost as possible, to subclass Lucene classes. This is where itstarts to bedifficult. So I've subclassed (most of) span queries classes sothat the
getSpans() method returns my own span interface :

public interface IExtendedSpans extends Spans,IMatcher {
}

public interface IMatcher {
     Match match();
}
The reason why I have a separate IMatcher interface is that spanqueries arenot the only queries which may "return" matches. We'll see thislater. So Iimplemented my own SpanNearQuery, which inherits the Lucene SNQ, sothatwhen a span is found, I can return the corresponding match. A matchis acollection of submatches, and I've decided to subclass the Matchclass foreach query type (this makes algorithms more readable, and easier towrite).
For a span near query, the match() method will basically return a
SpanNearMatch, and so on.
Problem : the Lucene span queries members are private -notprotected-, sosubclasses cannot use them. For example, my subclass needs accessto theclauses, and I have to use the getter while I could directly usethe member(performance implication). Next, the spans subclasses are privatestatic
classes, and I have to rewrite them to return *my* spans. So in this
particular point, this is really annoying because I have to copythe exactinner classes (if not anonymous...) just to add my match() method.This isannoying because by doing this, I'm potentially breakingcompatibility with
future releases of Lucene.
The problem was even harder when I had to add the match() method totheBooleanQuery : this class is so complex, and uses so many protectedor innerclasses (for optimization purposes, I must understand) that I wouldhave to
copy a lot of the original source code just to add my method. As
documentation on how it works is really hard to find, I decided itwould besimpler if I wrote my own boolean queries (which is what I've donenow). I
know it must be much less performant, but makes the tasks much easier.
By the way, it would really be glad if the you could extract aninterfacefrom the Query class. As all my queries implement an interface (tobe surethat you don't mix queries which support the match feature withones that
don't), it would avoid many casts (the other solution would be that I
extract the interface myself and make my IMatchAwareQuery interfacehavethose methods, but I'm sure it would be cleaner if this wasdirectly in
Lucene).

Last but not least, it would be glad if the HitCollector class had a
collect() method with an Object parameter : the scoring I'm usingcannotjust work on a collection of floats. It requires the matches, soI'm passinga DocMatchesHolder instance to my HitCollector so that it can workon it.This leads to the following (and not really clean) code recopied inmy top
level Scorer implementations :

public void score(HitCollector aHitCollector) throws IOException {
                if (aHitCollector instanceof SearchingContext) {
                        SearchingContext ctx = (SearchingContext) aHitCollector;
                        while (next()) {
                                final DocMatchesHolder doc = docMatches();
                                final float score = score();
                                ctx.addHit(doc, score);
                                ctx.collect(doc(), score);
                        }
                } else super.score(aHitCollector);
        }

Thanks for reading ;)

Cedric
--
View this message in context: http://www.nabble.com/Span-queries%2C-API-and-difficulties-tf4500460.html#a12835063Sent from the Lucene - Java Developer mailing list archive atNabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


------------------------------------------------------
Grant Ingersoll
http://www.grantingersoll.com/
http://lucene.grantingersoll.com
http://www.paperoftheweek.com/



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Span queries, API and difficulties

Reply via email to