Hi,
I am currently implementing a new similarity class into lucene which is
based on a language model with absolute discount.
I am basing my work on the work already done in the
LMDirichletSimilarity and LMJelinekMercerSimilarity which are really close.
However to end my implementation I need to get the number of unique
terms present in the document, and this information seems to be
unavailable natively from within the score function.
The computeNorm function which is in the Similarity class seems to be
the right place to compute (or read) and store this statistic but I am
not sure.
So I am reaching you to know if I am on the right track and if you have
any advice on how I could access this statistic from the computeNorm
function if possible ?
I would like the implementation to be as clean as possible with regards
to Lucene's code expectation to be able to submit it for integration
once it is done.
Thanks for your help,
Regards.
--
Romaric Pighetti
R&D - FranceLabs
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org