I have successfully used term vectors for both the TREC English and
Arabic collections. Take a look at the code I posted at
http://www.cnlp.org/apachecon2005 or Erik and Otis' book, "Lucene In
Action", which both have good examples of term vectors. Perhaps there
is something with how you are creating your fields.
Ira Goldstein wrote:
We've run into an issue with the term vectors. When indexing a small corpus
(~3k docs, 1.3G) everything works fine, as it does with a small number of
documents from TREC-6 (so we believe that our indexing code is ok). However,
when we tried to index the full TREC-6 corpus (~300,000 docs, 2G) the term
vectors all seem to come back as null. When the indexing process is going on,
we can see the .frq file being built so it appears as if the index routine is
doing its thing.
Has anyone experienced anything similar?
Thanks in advance for any insight you can offer
--Ira Goldstein
------ Original Message ------
Received: Tue, 13 Dec 2005 12:41:07 PM EST
From: "Dan Climan" <[EMAIL PROTECTED]>
To: <java-user@lucene.apache.org>
Subject: Impact of Term Vectors (was ApacheCon next week)
Good question. I was wondering about the impact of adding term vectors with
the various options. For example, is adding term vectors with both
positions
and offsets a significant impact? Which current parts of lucene (including
contributions) take advantage of term vectors being present? I know that
Highlighter class can make use of them if present.
Dan
-----Original Message-----
From: Jeff Rodenburg [mailto:[EMAIL PROTECTED]
Sent: Monday, December 12, 2005 9:08 PM
To: java-user@lucene.apache.org
Subject: Re: ApacheCon next week
Well done, Grant. Very informative.
Question on Term Vectors: with their inclusion in an index, have you
noticed
any degradation in performance, either from a search effiiciency or
maintenance point-of-view? Given the power of term vectors, if the perf
impact is negligible, I'm curious to the reasons why one would NOT include
term vectors in any and every index...
cheers,
j
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--
-------------------------------------------------------------------
Grant Ingersoll
Sr. Software Engineer
Center for Natural Language Processing
Syracuse University
School of Information Studies
337 Hinds Hall
Syracuse, NY 13244
http://www.cnlp.org
Voice: 315-443-5484
Fax: 315-443-6886
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]