Hi,

I've been trying to enumerate over all terms in all documents in a Lucene 4.0 index in order to retrieve their attributes (payloads, positions etc.).

I have an index with documents containing stored, tokenized fields with term vectors, offsets and payloads. Below is what I have tried so far (have to admit I don't fully understand this part of the 4.0 API yet).

My questions are: can I use either TermsEnum or DocsEnum or DocsAndPositionsEnum to access each term per each document and get its attributes? They all have the .attributes() method, but so far I haven't managed to make it return the actual attributes of individual terms (not even the CharTermAttribute).


Thanks,

Piotr Pezik


//Checking field type:

Document doc = dReader.document(1);
System.out.println(doc.getField("myField").fieldType());
//=> stored,indexed,tokenized,termVector,indexOptions=DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS

//Getting Terms and TermsEnum:

Terms terms = SlowCompositeReaderWrapper
                .wrap(directoryReader).terms("myField");
TermsEnum tenum = terms.iterator(TermsEnum.EMPTY);

//Moving to the next term (?)

BytesRef br = tenum.next();

System.out.println(tenum.attributes().hasAttributes());

//=>FALSE

System.out.println(tenum.attributes().getAttribute(PositionIncrementAttribute.class));

// => java.lang.IllegalArgumentException: This AttributeSource does not have the attribute 'org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute'.

Bits liveDocs = SlowCompositeReaderWrapper.wrap(dReader).getLiveDocs();


DocsEnum denum  = tenum.docs(liveDocs, null);
denum.nextDoc();
System.out.println(denum.attributes().hasAttributes());

//=>FALSE

DocsAndPositionsEnum denum2  = tenum.docsAndPositions(liveDocs, null);
denum2.nextDoc();
System.out.println(denum2.attributes().hasAttributes());

//=>FALSE




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to