Hello,
I cannot extract document term vectors from an index, and have not
turned up much in some determined googling. In short, when I call
IndexReader.getTermVector(docID, field) or
IndexReader.getTermVectors(docID) and then navigate down to the Terms
for the specified field, I get a null result.
// Indexing:
String bodyText = "this is foobar";
final FieldType BodyOptions = new FieldType();
BodyOptions.setIndexed(true);
BodyOptions.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
BodyOptions.setStored(true);
BodyOptions.setStoreTermVectors(true);
BodyOptions.setTokenized(true);
Document doc = new Document();
doc.add(new Field("body", bodyText, BodyOptions));
When I examine docs in Luke, I can see the term vectors.
// Retrieving (at a later time)
DirectoryReader dirRdr = DirectoryReader.open(FSDirectory.open(new
File(path)));
SlowCompositeReaderWrapper rdr = new SlowCompositeReaderWrapper(dirRdr);
for (int i = 0; i < rdr.maxDoc(); ++i) {
int numTerms = 0;
Terms terms = rdr.getTermVector(i, "body");
if (terms != null) {
TermsEnum term = terms.iterator(null);
while (term.next() != null) {
++numTerms;
}
System.out.println("doc " + i + " had " + numTerms + " terms");
}
else {
System.err.println("null term vector on doc " + i);
}
}
On every doc, the Terms object I get back from getTermVector(i, "body") is null.
Jon
--
Jon Stewart, Principal
(646) 719-0317 | [email protected] | Arlington, VA
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]