Re: tf/idf similarity with modified document similarity

Jack Krupansky Fri, 07 Mar 2014 17:28:35 -0800

Do you expect to have relatively large or relatively small result sets? Forthe former, are you willing to accept slow performance? I mean, your logicwill have to scan all of the documents and fetch and check their termfrequencies to count up df for each desired term. Maybe at least some ofthat info is hanging around as part of the query matching process.

Still, that is a reasonable feature to want and it has been requestedbefore. Worth a Jira.


-- Jack Krupansky

-----Original Message-----From: Christian Reuschling

Sent: Thursday, March 6, 2014 1:34 PM
To: [email protected]
Subject: tf/idf similarity with modified document similarity

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,

what is the best method to score documents similar to default similarity,but the documentfrequency should be calculated per query against the matching resultdocument set, not statically

against the whole corpus.

Didn't found a good and performant solution yet.

Thank you!

Christian
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlMYv6AACgkQ6EqMXq+WZg+cjQCbBCwxnGyn18kEEbJ2aHbiyTNv
xpcAnRho4H/YGKzsmoOXN91+06nruhHa
=g3Ka
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]

For additional commands, e-mail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: tf/idf similarity with modified document similarity

Reply via email to