Re: [jira] Commented: (LUCENE-845) If you "flush by RAM usage" then IndexWriter may over-merge

Doug Cutting Mon, 26 Mar 2007 10:18:25 -0800

Steven Parkes wrote:

And what about Project Gutenburg?


Wikipedia is going to have relatively short text, Gutenburg very long.

Very long documents are useful for testing for anomalies, but they'renot so useful as retrieved documents, nor typical of applications. Verylong hits are awkward for users. Book search engines usually operatebest either by breaking texts into small units (chapters, pages,overlapping windows, etc.) and searching those rather than the entirework, perhaps merging multiple hits from the same work in displayedresults. (See, e.g., California Digital Library's XTF system, built byKirk Hastings using Lucene. http://www.cdlib.org/inside/projects/xtf/)


I think Wikipedia is a much more typical use of Lucene.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [jira] Commented: (LUCENE-845) If you "flush by RAM usage" then IndexWriter may over-merge

Reply via email to