: For example, if a field F has values A, B and C the following example : cases arise: : 1. A and B both generate no tokens ==> no positionIncrementGaps are : generated : 2. A has no tokens but B does ==> just the gap between B and C : 3. A has tokens but B and C do not ==> both gaps between A and B, and : between B and C are generated : : So, empty fields are treated anomalously. They are ignored for gap : purposes at the beginning of the field list, but included if they occur : later in the field list.
Since positions are allways relative, i'm not sure i understnad how this caused a problem in for you ... but I suspect it's because there's more to what you describe ... in each of the 3 causes you outlined what happens if there is a field value "D" which allways produces tokens? Based on your description so far, i'm guessing the following scenerio (using lower case to indicate no tokens produced and upper case to indicate tokens were produced) ... 1) a b C _gap_ D ...results in: C _gap_ D 2) a B _gap_ C _gap_ D ...results in: B _gap_ C _gap_ D 3) A _gap_ b _gap_ c _gap_ D ...results in: A _double_gap_ D ...is that the behavior you are seeing? Only case #3 seems "wrongish" to me there. ... i started to explain why i thought it made sense to go ahead and "fix this", where by fix i ment only insert one gap in case#3 ... and then realized i was acctually arguing in favor of the current behavior for case#3, here is why... based on the semi-frequently discussed usage of token gap sizes to denote sentence/paragraph/page boundaries for the purpose of sloppy phrase queries, it certianly seems worthwhile to fix to me (so that queries like "find Erik within 3 pages of Otis" still work even if one of those pages is blank ... ...that's when i realized the current behavior of case#3 is acctually important for accurate matching, otherwise a search for two words within a certain number of pages would have a false match if those pages were blank. case #1 seems fine, but case #2 seems like the "wrong" case to me know, becuase trying to find occurances of "B" on page #1 using a SpanFirst query will have false positives ... it seems like the positionIncrimentGap should always be called/used after any field value is added (even if the value results in no okens) before the next value is added (even if that value results in no tokens) Does this jive with what you were expecting, and the patch you were considering? -Hoss --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]