Hi Amit,
If you want all the freqs of all the terms (or even just some of the
terms) in all documents, you don't need to use Term Vectors, take a
look at TermEnum and TermDocs.
If you want for specific documents, then you do need Term Vectors.
You may get some CPU ticks by only keeping Positions (and not
offsets, assuming you don't need them), but I haven't confirmed
this. I just figure it is slightly less data to read and write so it
should be faster.
No, you do not need to Store your docs to have a term vector. They
are written to different parts of the Index.
-Grant
On Aug 9, 2006, at 3:13 PM, Amit Kumar wrote:
Hi Lucene Users,
I am using the lucene indices to get term frequencies. I just
wanted to check with you about the
time it is taking to retrieve these term freq. Please suggest if I
can improve the code/index or if
this is expected. It takes 8 to 9 seconds to retrieve the term freq
values of all 1030 documents,
with an index size of ~530MB.
Another question I have is Do I need to have Field.Store.Yes to get
the term freq vector?
Index Details:
-------------------
Size: 532 MB,
1032 Documents with varying number of terms from 600 to 100,000
The field is indexed as Field.Store.YES,
Field.Index.TOKENIZED,Field.TermVector.WITH_POSITIONS_OFFSETS
Term Freq Retrieval Time Values:
-------------------------------------
The time ranges in 8 to 9 seconds
long s = System.currentTimeMillis();
TermFreqVector termFreqVector;
for (int i = 0; i < 1030; i++) {
if (!reader.isDeleted(i)) {
termFreqVector = reader.getTermFreqVector(i, field);
}
}
long l = System.currentTimeMillis();
Hardware and Memory Settings:
-------------------------------------------
-Xmx 2048m -XX:PermSize=16m -XX:MaxPermSize=128m
Dual 1800 MHz Optron on 32 bit Linux 2.6.15.2; Lucene 2.0.0.
How can I get better results? Can I?
Many thanks for your help.
-Amit
---------------------------------------------------------
Amit Kumar
Research Programmer
The Graduate School of Library and Information Science
University of Illinois, Urbana Champaign IL, 61820
phone: 217-333-4118 fax: 217-244-3302
---------------------------------------------------------
------------------------------------------------------
Grant Ingersoll
http://www.grantingersoll.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]