... > > So either this patch should be pulled, or we need to add > > position-increment-like support to PhraseQuery. I plan to do the > > latter in the next few months (for a contract I'm working on) so > > perhaps we should just pull this patch until PhraseQuery is updated, > > at which time we can consider updating QueryParser to take advantage > > of this feature. > > Sounds good to me. I can't wait to see the new and improved > PhraseQuery!
I have a question related to the way position increment is handled in DocumentWriter's invertDocument (main tokenization/indexing method). It does following: for (Token t = stream.next(); t != null; t = stream.next()) { position += (t.getPositionIncrement() - 1); addPosition(fieldName, t.termText(), position++); if (position > maxFieldLength) break; } If I'm not mistaken, this means that maxFieldLength comparison counts in "holes" in token sequence. And such behaviour might be problematic, especially if such holes are used to mark sentence/paragraph boundaries (to reduce score or avoid hit for phrase queries), which was discussed recently. Also, since that count is saved in index, such holes "bloat" perceived document size, and thus reduce document's relative weight. It'd be easy to fix this to only count tokens (I can provide patch if so), but I wanted to make sure I don't misunderstand something fundamental here? -+ Tatu +- --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]