On Saturday 25 June 2005 04:26, jian chen wrote: > Hi, > > I think Span query in general should do more work than simple Phrase > query. Phrase query, in its simplest form, should just try to find all > terms that are adjacent to each other. Meanwhile, Span query does not > necessary be adjacent to each other, but, with other words in between. > > Therefore, I think Span query deserves to be slower than Phrase query. > This said, Span query is way more powerful than Phrase query. > > Jian > > On 25 Jun 2005 00:00:18 -0000, [EMAIL PROTECTED] > <[EMAIL PROTECTED]> wrote: > > Hi, > > > > I'm comparing SpanNearQuery to PhraseQuery results and noticing about > > an 8x difference on Linux. Is a SpanNearQuery doing 8x as much work? > > > > > > I'm considering diving into the code if the results sounds unusual to people. > > But if its really doing that much more work, I won't spend time optimizing > > something that can't get much faster.
The main difference is in the extra generality of Spans over positions. Spans have a begin position and an end position. Matching two Spans for the terms of a phrase requires testing both their begin positions and their end positions, even though they differ only by a constant for the same term. Spans also carry around their current document number and this may involve some more redundancies when finding finding the matches within a single document. Also, for exact matches (zero slop) PhraseQuery uses a separate scorer that takes full advantage of the special case. So, when the generality of the Spans is not needed, one should always try and use a PhraseQuery. I'm not surprised that SpanNearQuery is slower than PhraseQuery, and I'd expect a factor 3-4 between them. The factor 8 might indicate that there is some room for improvement in the span package. (I'd expect the CellQueue in NearSpans to be the bottleneck.) Regards, Paul Elschot --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]