Do you expect to have relatively large or relatively small result sets? For the former, are you willing to accept slow performance? I mean, your logic will have to scan all of the documents and fetch and check their term frequencies to count up df for each desired term. Maybe at least some of that info is hanging around as part of the query matching process.

Still, that is a reasonable feature to want and it has been requested before. Worth a Jira.

-- Jack Krupansky

-----Original Message----- From: Christian Reuschling
Sent: Thursday, March 6, 2014 1:34 PM
To: java-user@lucene.apache.org
Subject: tf/idf similarity with modified document similarity

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,

what is the best method to score documents similar to default similarity, but the document frequency should be calculated per query against the matching result document set, not statically
against the whole corpus.

Didn't found a good and performant solution yet.

Thank you!

Christian
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlMYv6AACgkQ6EqMXq+WZg+cjQCbBCwxnGyn18kEEbJ2aHbiyTNv
xpcAnRho4H/YGKzsmoOXN91+06nruhHa
=g3Ka
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to