Re: Benchmarking results

Grant Ingersoll Fri, 07 Apr 2006 05:42:19 -0700


Marvin Humphrey wrote:

On Apr 4, 2006, at 10:23 AM, Tatu Saloranta wrote:
So in this case, what would give more comparable results (assuming
you are interested in measuring likely server-side
usage scenario, which is usually what Lucene is used for)
Actually, I think the benchmark results illustrate that everyoneshould be at least mildly concerned about where the Term Vector datagets stored. KinoSearch only writes that data once. Lucene, however,has to read/write that data during each merge, and the more streamsyou have, the more complex the merge. It stands to reason thatstoring term vector data with the stored fields data would speed upthe merge process.

This seems like a good idea., especially combined with the lazyloading/retrieve specified fields approach that we are proposing, sothat we aren't getting the term vector every time we retrieve adocument. We could deprecate the IndexReader.getTermVector methods andmove it to be accessed via the Field. Not sure what the issues arecompletely, but it makes sense, since the TV data is not changing.

Are there any other significant applications?

Clustering.  Corpora analysis/browsing.  Most likely others

--

Grant IngersollSr. Software EngineerCenter for Natural Language ProcessingSyracuse UniversitySchool of Information Studies335 Hinds HallSyracuse, NY 13244http://www.cnlp.orgVoice: 315-443-5484Fax: 315-443-6886


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Benchmarking results

Reply via email to