There are a few approaches possible here, we had a similar use case and went for the second one below. I primarily deal with Solr, so I don't know of Lucene-only examples, but hopefully you can dig this up..
(1) You can attach payloads to each occurrence of the tag, and modify the scoring to use the payload.. (2) Use term frequency as a proxy. You could scale the probability by a factor and introduce the term as many times as the scaled value (essentially making it the term frequency). Scoring will know account for this. Advantage is that you also can achieve score normalisation with keywords and amongst tags, and you can also filter results by probability. (3) There potentially is also a solution using child documents and block join, but I may be mistaken, haven't explored this a lot.. On 27 Oct 2014 16:10, "Ralf Bierig" <ralf.bie...@gmail.com> wrote: > I want to index documents together with a list of tags (usually between > 10-30) that represent meta information about this document. Normally, i > would create an extra field "tag" store every tag, by its name, inside that > field and create my 10-30 fields that and adding it to the document before > adding the document to the index and writing the index. > > However, I have the following extra requirements: > > a) I need to have a weight in the range of [0,1] being associated with the > tag that represents the probability of this tag being true. > > b) These tags must be associated with the document and not with the terms > of the document. > > c) I must be able to associate many tags to a document instance. > > d) I must be able to use the weight in the weighting process of the search > engine. > > e) The weight must be for the document instance, as the weight represents > the probability for that tag for that particular document. E.g. > > fieldname: tag > fieldvalue: tree > fieldweight: 0.8 > > meaning that this particular document is with a probability of 0.8 about > trees. > > What is the best way to do that? > Can somebody point me to an example or something quite similar that > captures such a problem? > > Best, > Ralf > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >