Op Wednesday 07 May 2008 10:18:38 schreef Eran Sevi: > Thanks Paul for your reply, > > Since my index contains a couple of millions documents and the filter > is supposed to limit the search space to a few thousands I was hoping > I won't have to do the filtering myself after running the query on > all the index.
The code I gave earlier effectively does a filtered query search on the index. It visits the resulting Spans, and does not provide a score value per document as SpanScorer would do. Please make sure to test that code thoroughly for reliable results. > > Maybe this is the case anyway and behind the scenes the filter does > exactly what you suggested. Yes, a filtered query search would use skipTo() on the Spans via SpanScorer. But the difference between the normal case and your case is that you don't need SpanScorer. > From what I tested the number of results of the SpanQuery greatly > affects the running speed so if I'm going to use about 0.1% of the > results I'm loosing a lot of time and memory for gathering and > storing the spans I'm not going to use. > > I don't know how SpanQuery works internally but I guess that if the > filter is known beforehand, A Filter needs to make a BitSet available before the query search. > it could speed things up quite a bit. I would expect a substantial speedup from using skipTo() on the Spans when only 0.1% of the results passes the filter. Regards, Paul Elschot > Eran. > > > On Wed, May 7, 2008 at 10:34 AM, Paul Elschot > <[EMAIL PROTECTED]> > > wrote: > > Op Tuesday 06 May 2008 17:39:38 schreef Paul Elschot: > > > Eran, > > > > > > Op Tuesday 06 May 2008 10:15:10 schreef Eran Sevi: > > > > Hi, > > > > > > > > I am looking for a way to filter a SpanQuery according to some > > > > other query (on another field from the one used for the > > > > SpanQuery). I need to get access to the spans themselves of > > > > course. I don't care about the scoring of the filter results > > > > and just need the positions of hits found in the documents that > > > > matches the filter. > > > > > > I think you'll have to implement the filtering on the Spans > > > yourself. That's not really difficult, just use Spans.skipTo(). > > > The code to do that could look sth like this (untested): > > > > > > Spans spans = yourSpanQuery.getSpans(reader); > > > BitSet bits = yourFilter.bits(reader); > > > int filterDoc = bits.nextSetBit(0); > > > while ((filterDoc >= 0) and spans.skipTo(filterDoc)) { > > > boolean more = true; > > > while (more and (spans.doc() == filterDoc)) { > > > // use spans.start() and spans.end() here > > > // ... > > > more = spans.next(); > > > } > > > if (! more) { > > > break; > > > } > > > filterDoc = bits.nextSetBit(spans.doc()); > > > > At this point, no skipping on the spans should be done when > > filterDoc equals spans.doc(), so this code still needs some work. > > But I think you get the idea. > > > > Regards, > > Paul Elschot > > > > > } > > > > > > Please check the javadocs of java.util.BitSet, there may > > > be a 1 off error in the arguments to nextSetBit(). > > > > > > Regards, > > > Paul Elschot > > > > > > > I tried looking through the archives and found some reference > > > > to a SpanQueryFilter patch, however I don't see how it can help > > > > me achieve what I want to do. This class receives only one > > > > query parameter (which I guess is the actual query) and not a > > > > query and a filter for example. > > > > > > > > Any help about how I can achieve this will be appreciated. > > > > > > > > Thanks, > > > > Eran. > > > > > > ----------------------------------------------------------------- > > >---- To unsubscribe, e-mail: > > > [EMAIL PROTECTED] For additional commands, > > > e-mail: [EMAIL PROTECTED] > > > > ------------------------------------------------------------------- > >-- To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]