Ted Dunning wrote:
I don't think that this would be such a great idea.

Better to use a custom
similarity<http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//org/apache/lucene/search/Similarity.html>data
structure.  Before you do that, though, you might try just using the
overall corpus statistics and not worry about this per user indexing with
specialized statistics.  If users' are no more different from each other
than sub-corpora in a normal retrieval system then you are liable to get
much better results using corpus wide stats than with user level stats.

On Mon, Jun 15, 2009 at 2:06 PM, Lionel Duboeuf
<[email protected]>wrote:
ok, enven if i modify similarity measure, i will face polysemy problem.
e.g. the term "car" in english is different to the term "car" in french.
Also what is the best approach to calculate easily (and fastly) numDocs for a given user ?

thanks for your answer.

lionel




Reply via email to