The second solution sounds great and a lot more natural than payloads.

I know how to overwrite the Similarity class but this one would only be called at search time and then already use the existing term frequency. Looking up the probabilities every time a search is performed is probably also not performing well. So, I suspect I would somehow need to find a way to store the term frequency directly into the index at the time when I am indexing documents. Is that correct?

Do you have a code sniplet that would highlight that part of your elegant solution?

Thanks in advance,
Ralf

On 28.10.2014 09:31, Ramkumar R. Aiyengar wrote:
There are a few approaches possible here, we had a similar use case and
went for the second one below. I primarily deal with Solr, so I don't know
of Lucene-only examples, but hopefully you can dig this up..

(1) You can attach payloads to each occurrence of the tag, and modify the
scoring to use the payload..

(2) Use term frequency as a proxy. You could scale the probability by a
factor and introduce the term as many times as the scaled value
(essentially making it the term frequency). Scoring will know account for
this. Advantage is that you also can achieve score normalisation with
keywords and amongst tags, and you can also filter results by probability.

(3) There potentially is also a solution using child documents and block
join, but I may be mistaken, haven't explored this a lot..
  On 27 Oct 2014 16:10, "Ralf Bierig" <ralf.bie...@gmail.com> wrote:

I want to index documents together with a list of tags (usually between
10-30) that represent meta information about this document. Normally, i
would create an extra field "tag" store every tag, by its name, inside that
field and create my 10-30 fields that and adding it to the document before
adding the document to the index and writing the index.

However, I have the following extra requirements:

a) I need to have a weight in the range of [0,1] being associated with the
tag that represents the probability of this tag being true.

b) These tags must be associated with the document and not with the terms
of the document.

c) I must be able to associate many tags to a document instance.

d) I must be able to use the weight in the weighting process of the search
engine.

e) The weight must be for the document instance, as the weight represents
the probability for that tag for that particular document. E.g.

fieldname: tag
fieldvalue: tree
fieldweight: 0.8

meaning that this particular document is with a probability of 0.8 about
trees.

What is the best way to do that?
Can somebody point me to an example or something quite similar that
captures such a problem?

Best,
Ralf

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to