Distinct terms within a document for new Similarity class

Romaric Pighetti Mon, 20 May 2019 03:05:22 -0700

Hi,

I am currently implementing a new similarity class into lucene which isbased on a language model with absolute discount.I am basing my work on the work already done in theLMDirichletSimilarity and LMJelinekMercerSimilarity which are really close.However to end my implementation I need to get the number of uniqueterms present in the document, and this information seems to beunavailable natively from within the score function.

The computeNorm function which is in the Similarity class seems to bethe right place to compute (or read) and store this statistic but I amnot sure.So I am reaching you to know if I am on the right track and if you haveany advice on how I could access this statistic from the computeNormfunction if possible ?

I would like the implementation to be as clean as possible with regardsto Lucene's code expectation to be able to submit it for integrationonce it is done.


Thanks for your help,
Regards.

--
Romaric Pighetti
R&D - FranceLabs


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Distinct terms within a document for new Similarity class

Reply via email to