Document term vectors in Lucene 4

Jon Stewart Wed, 16 Jan 2013 21:52:48 -0800

Hello,

I cannot extract document term vectors from an index, and have not
turned up much in some determined googling. In short, when I call
IndexReader.getTermVector(docID, field) or
IndexReader.getTermVectors(docID) and then navigate down to the Terms
for the specified field, I get a null result.


// Indexing:
  String bodyText = "this is foobar";
  final FieldType BodyOptions = new FieldType();
  BodyOptions.setIndexed(true);
  
BodyOptions.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
  BodyOptions.setStored(true);
  BodyOptions.setStoreTermVectors(true);
  BodyOptions.setTokenized(true);
  Document doc = new Document();
  doc.add(new Field("body", bodyText, BodyOptions));

When I examine docs in Luke, I can see the term vectors.

// Retrieving (at a later time)
  DirectoryReader dirRdr = DirectoryReader.open(FSDirectory.open(new
File(path)));
  SlowCompositeReaderWrapper rdr = new SlowCompositeReaderWrapper(dirRdr);
  for (int i = 0; i < rdr.maxDoc(); ++i) {
    int numTerms = 0;
    Terms terms = rdr.getTermVector(i, "body");
    if (terms != null) {
      TermsEnum term = terms.iterator(null);
      while (term.next() != null) {
        ++numTerms;
      }
      System.out.println("doc " + i + " had " + numTerms + " terms");
    }
    else {
      System.err.println("null term vector on doc " + i);
    }
  }

On every doc, the Terms object I get back from getTermVector(i, "body") is null.


Jon
--
Jon Stewart, Principal
(646) 719-0317 | [email protected] | Arlington, VA

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Document term vectors in Lucene 4

Reply via email to