Hi Paul, Do you mean the following?
e.g. to index this: "first second third <paragraphBorder> forth fifth six" originally it would be indexed as: (first,0) (second,1) (third,2) (forth,3) (fifth,4) (six,5) now it will be: (first,0) (second,0) (third,0) (forth,1) (fifth,1) (six,1) Then those Query classes that depends on the positional information (PhraseQuery, SpanQueries) won't work then? unfortunately I'll need those Query classes as well. Cedric > For each word in the input stream make sure that the position > at which it is indexed in an extra field is the same as the paragraph > number. That will involve only allowing a position increment at > a paragraph border during indexing. > Call this extra field the paragraph field if you will. > > Then, during search, search for a Term in paragraph field, and > use the position from that field, i.e. the paragraph number > to find a weight for the found term. > Have a look at PhraseQuery on how to use term positions during > search. It computes relative positions, but it works on the absolute > positions that it gets from the index. > > SpanFirstQuery also allows to do that, it's a bit more involved, but > in the end it works from the same absolute positions from the index. > The version at the jira issue will even allow to use the length of the > matching spans as the absolute paragraph number, which, in turn, > allows the use of a Similarity for the paragraph weights [10/5/2]. > > There is nothing special about indexed term positions; any term can > be indexed at any position in a field. Lucene will take advantage of > the incremental nature of positions by storing only compressed > differences of positions in the index, but during search the original > positions are directly available, You can do the same with payloads, > but why reimplement something that is already available? > > Payloads have better uses than positional info, for one they are > great to avoid disjunctions. For example for verbs, one could > index only the stem and use a payload for the actual inflected > form (singular/plural, past/present, first/second/third person, etc). > > Regards, > Paul Elschot > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]