On 05/11/2008, at 4:36 AM, Michael McCandless wrote:
If possible, you should try to use a larger corpus (eg Wikipedia)
rather than multiply Reuters by N, which creates unnatural term
frequency distribution.
I'll replicate the tests with the wikipedia corpus over the next few
days and
On 03/11/2008, at 11:07 PM, Mark Miller wrote:
Am I missing your benchmark algorithm somewhere? We need it.
Something doesn't make sense.
I thought I had included in at[1] before but apparently not, my
apologies for that. I have updated that wiki page. I'll also reproduce
it here:
{
Howdy,
I have a couple of questions regarding some Lucene benchmarking and
what the results mean[3]. (Skip to the numbered list at the end if you
don't want to read the lengthy exegesis :)
I'm a developer for JIRA[1]. We are currently trying to get a better
understanding of Lucene, and
On 03/11/2008, at 4:27 PM, Otis Gospodnetic wrote:
Why are you optimizing? Trying to make the search faster? I would
try to avoid optimizing during high usage periods.
I assume that the original, long-ago, decision to optimize was made to
improve searching performance.
One thing that