Is this a theoretical question or is there a use-case you're trying to support? If the latter, a statement of the problem you're trying to solve would be helpful.
If the former, setting all your start offsets to 0 seems wrong. You're essentially saying that all tokens are at the beginning of the document and each one is some ever-increasing distance from the start. Some of your later words will look really, really long (like the length of your document). Offhand, I expect this will affect up span queries, phrase queries, and who knows what else? Maybe scoring? Or maybe not, maybe some (or all) of these work off of termpositions rather than offsets. So unless you're trying to accomplish something specific, I sure wouldn't do this given the problems it *could* create. That said, I'm not much of an expert on how the offsets are used, but I'd be really leery of changing them "just for the fun of it" <G>...... Best Erick On Mon, May 12, 2008 at 12:06 PM, Brendan Grainger < [EMAIL PROTECTED]> wrote: > Hi, > > I have a TokenStream that inserts synonym tokens into the stream when > matched. One thing I am wondering about is what is the effect of the > startOffset and endOffset. I have something like this: > > Token synonymToken = new Token(originalToken.startOffset(), > originalToken.endOffset(), "SYNONYM"); > synonymToken.setPositionIncrement(0); > > What I am wondering is if I set the startOffset and endOffset to 0 and the > endOffset to the length of the synonym string what effect will this have? > > eg > Token synonymToken = new Token(0, repTok.endOffset(), "SYNONYM"); > synonymToken.setPositionIncrement(0); > > Thanks > >