Hello, > Hi, > I am storing custom values in the Tokens provided by a Tokenizer but > when retrieving them from the index the values don't match.
What do you mean by retrieving? Do you mean retrieving terms, or do you mean doing a search with words you know that should be in, but you do not find a match? In the latter, you must make sure that you are using the same analyzer for the search as you used for indexing. > I've looked > in the LIA book but it's not current since it mentioned term vectors > aren't stored. I'm using Lucene Nightly 146 but the same thing has > happened with older versions. Looking at the internals, > DocumentWriter > seems to keep track of the end offset that was placed into > the index and > modifies the token values (with +1) but I'm not sure whether > I should be > concerned with it. > No existing analyzers are used when adding the document so all the > offsets are generated manually. > Any suggestions of how the token offsets should be stored? > Look at other clases that implement TokenStream. Also take a look at setPositionIncrement when you are putting in your own terms Regards Ard > Is this valid? > Token, start, end > aaa, 0, 3 > bbb, 4, 7 > ccc, 8, 11 > > Thanks, > Shahan > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]