If you are storing the term vector when you index, then you can ask the IndexReader for the vector using the getTermFreqVector() method, which will return the TermFreqVector which should have the information you need

[EMAIL PROTECTED] wrote:

I hope that this isn't a newbies question, but let me
ask the more general question.  While IndexReader can
return the documents containing the term t, I need to
do the opposite.  Is there a method, given document d,
that will return all of the terms in that document (I
need to calculate the average tf and the number of
unique terms in each document)?
After indexing a set of plain text files using
org.apache.lucene.demo.IndexFiles, I looked at
Document.fields, but all that it returned was:

Text<path:C:\text\02\7laft10.txt>
Keyword<modified:0efvmrdgi>

Any insight would be appreciated.

Thanks
-- MG

Chris Hostetter <[EMAIL PROTECTED]> wrote:

------ Original Message ------
Received: Fri, 28 Oct 2005 08:22:04 PM EDT
From: Chris Hostetter <[EMAIL PROTECTED]>
To:  java-user@lucene.apache.org
Subject: Re: Term Vectors

:  "Now, you can get these term vectors per
documents with the Lucene API if
the
: index was built with the term vectors option."
:
:   How does one invoke the term vectors option when
building the index and
: where can one find a list of the various options
(I really did try looking
at
: the docs, but could not find any reference to
this).

there are very few generic options that apply when
"building the index"
.. most options are specific to the individual
documents as you add them
-- you can choose to store the TermVectors for the
"FOO" field of one
document, but leave them out of another.

Options like wether or not a Field is indexed,
stored, tokenized, or has
it's TermVector stored are all controlled when you
construct the Field
object prior to adding it to the document...


http://lucene.apache.org/java/docs/api/org/apache/lucene/document/Field.html
http://lucene.apache.org/java/docs/api/org/apache/lucene/document/Document.html
-Hoss









                
__________________________________ Start your day with Yahoo! - Make it your home page! http://www.yahoo.com/r/hs

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



--
------------------------------------------------------------------- Grant Ingersoll Sr. Software Engineer Center for Natural Language Processing Syracuse University School of Information Studies 337 Hinds Hall Syracuse, NY 13244 http://www.cnlp.org Voice: 315-443-5484 Fax: 315-443-6886

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to