Hi, this example is really close to what I'm trying to do but unfortunately it uses a lot of classes that are outdated in version 2.9.2 that I'm currently using.
Actually, it uses text in which boosts follow terms (delimited by a special char), parses the text and then adds documents to the index. I would like to do the same thing, but building the terms programmatically, without a text parser. I guess I should study the termAttribute API stuff. thanks bec! On Wed, Jun 2, 2010 at 8:37 AM, Rebecca Watson <bec.wat...@gmail.com> wrote: > Hi, > > if you want to store word+value pairs then use lucene scoring to weight > the words with higher vaules against them, you should look at using > payloads > and the DelimitedPayloadTokenFilter which lets you specify e.g. > word1|value1 word2|value2 ... > and the values are stored as payloads against the word tokens (which can > still be analyzed using stemming etc if you put the payload filter at the > beginning of the tokenizer stream set). > > then at query time you'd need to use a payload query type to get the > weights > included in the scoring for docs -- > see the article by lucid imagination here: > > http://www.lucidimagination.com/blog/2010/04/18/refresh-getting-started-with-payloads/ > > not sure if that's the sort of thing you're after? > > bec :) > > On 2 June 2010 04:29, Dionisis Koumouras <kum...@gmail.com> wrote: > > Thanks for your reply Grant. > > I checked out the TokenStream class and you are right but I'm afraid I > > didn't really make myself understood. What I want is to be able to create > a > > Document out of key-value pairs of terms and float numbers representing > word > > weights, insert the Document in the index and then use the lucene scoring > > mechanism to retrieve the entries. > > Do you find this feasible? > > > > On Tue, Jun 1, 2010 at 8:35 PM, Grant Ingersoll <gsing...@apache.org> > wrote: > > > >> > >> On May 31, 2010, at 6:25 AM, Dionisis Koumouras wrote: > >> > >> > Hi all, > >> > I'm new to lucene but have used it succesfully for a few simple tasks. > >> > > >> > I am experimenting with the vector space representation of documents > and > >> > have managed to store and retrieve TermFreqVector objects. > >> > > >> > The question is whether it is possible to directly add vector space > >> > representations of documents to an index. I can't find any way to > create > >> a > >> > document field from a TermFreqVector object. > >> > >> The Field constructor can take in a TokenStream (i.e. a preanalyzed > stream) > >> which you could easily back with a TermFreqVector. > >> > >> > > >> > This is the use case behind the question: retrieve some documents from > >> the > >> > index, cluster them, and store the vector space representations of the > >> > clusters back to the index. > >> > > >> > Dionisis > >> > >> -------------------------- > >> Grant Ingersoll > >> http://www.lucidimagination.com/ > >> > >> Search the Lucene ecosystem using Solr/Lucene: > >> http://www.lucidimagination.com/search > >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: java-user-h...@lucene.apache.org > >> > >> > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >