Thanks Jack, that's a very good option indeed. But this method in some sense lacks precision with sentence boundaries. Thinking of alternatives, is it possible to encode multiple values in a single int for storing position increment and decode it the same way for SpanNearQueries, and is it not a totally terrible idea?)
-- Igor 25.04.2013, 15:26, "Jack Krupansky" <j...@basetechnology.com>: > You can use SpanNearQuery to seek matches within a specified distance. > > Lucene knows nothing about "sentences". But if you have an analyzer or > custom code that artificially bumps the position to the next multiple of > some number like 100 or 1000 when a sentence boundary pattern is > encountered, you could use that number times n to match within n sentences, > roughly, plus or minus a sentence or two - there is nothing to cause the > nearness to be rounded or truncated exactly to one of those boundaries. > > Maybe you want two numbers: 1) sentence separation, say 1000, and 2) maximum > sentence length, say 500. The SpanNearQuery would use n-1 times the sentence > separation plus the maximum sentence length. Well, you have to adjust that > for how you count sentences - is 1 the current sentence or is that 0? > > -- Jack Krupansky > > -----Original Message----- > From: Igor Shalyminov > Sent: Thursday, April 25, 2013 6:54 AM > To: java-user@lucene.apache.org > Subject: Multiple PositionIncrement attributes > > Hi all! > > I use PositionIncrement attribute for finding words at some distance from > each other. And I have two problems with that: > 1) I want to search words within one sentence. A possible solution would be > to set PositionIncrement of +INF (like 30 :) ) to the sentence break tag. > 2) I want to use in my search both word-distance and sentence-distance > between words (e.g. find the word "Putin" within 3 sentences after the word > "Obama" or find the words "cheese" and "bacon" in one sentence within 3 > words of each other). > > For the 2nd problem, is there a way of storing multiple position information > sources in the index and using them for searching? Say, at least choosing > one of those for a query. > > -- > Best Regards, > Igor Shalyminov > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org