Re: index per-user basis and document frequency

lionel duboeuf Tue, 16 Jun 2009 01:52:00 -0700

Ted Dunning wrote:

I don't think that this would be such a great idea.


Better to use a custom
similarity<http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//org/apache/lucene/search/Similarity.html>data
structure.  Before you do that, though, you might try just using the
overall corpus statistics and not worry about this per user indexing with
specialized statistics.  If users' are no more different from each other
than sub-corpora in a normal retrieval system then you are liable to get
much better results using corpus wide stats than with user level stats.

On Mon, Jun 15, 2009 at 2:06 PM, Lionel Duboeuf
<[email protected]>wrote:

ok, enven if i modify similarity measure, i will face polysemy problem.
e.g. the term "car" in english is different to the term "car" in french.

Also what is the best approach to calculate easily (and fastly) numDocsfor a given user ?


thanks for your answer.

lionel

Re: index per-user basis and document frequency

Reply via email to