Re: Retrieving attributes of terms in lucene

Michael McCandless Fri, 13 Sep 2013 09:42:33 -0700

The TypeAttribute is not saved in the index "automatically"; if you
want that, you need to move it over to the payload, which it looks
like you are doing, so then just use DocsAndPositionsEnum.getPayload()
to get it.


CharTermAttribute is the term text, ie it's the terms exposed from
TermsEnum.  In your code you are already .seekExact to a given term
...

Mike McCandless

http://blog.mikemccandless.com


On Wed, Sep 11, 2013 at 9:24 AM, nischal reddy
<nischal.srini...@gmail.com> wrote:
> Hi,
>
> I have written a custom Tokenizer which will split my input text into
> tokens, i have overridden the incrementToken method and setting
> chartermAttribute, offsetAttribute, typeAttribute (Please find the method
> below..)
>
> @Override
>     final public boolean incrementToken() throws IOException {
>         clearAttributes();
>         if(reader == null){
>             reader = input;
>             initProgressLexer();
>         }
>         TokenType myObj = null;
>         if((myObj = next()) != null){
>             charTermAttribute.append(myObj.tokenText);
>             offsetAttribute.setOffset(myObj.startOffset, myObj.endOffset);
>             typeAttribute.setType(myObj.type);
>             payloadAttribute.setPayload(new
> BytesRef(myObj.type.getBytes()));
>             return true;
>         }else{
>             return false;
>         }
>     }
>
> now when i search for a text in my index i want to retrieve the type,
> offset and charTermAttribute of the matched tokens.
>
> to achieve this i am using the matched documents to retrieve the
> DocsAndPositionsEnum object and then calling the method startOffset() and
> endOffset() to retrieve the offsets and getPayload() to get the payload.
> but i am not able to retrieve the type and charTermAttribute values of the
> matched terms. Below is the method where i am doing all the stuff to
> retrieve the offsets.
>
> private void showHits(TermQuery query, TopDocs hits)
>             throws CorruptIndexException, IOException {
>         ProgressSearchEngine
>                 .debug("Found " + hits.totalHits
>                         + " document(s) that matched query '"
>                         + query.toString() + "':");
>         for (ScoreDoc scoreDoc : hits.scoreDocs) {
>             //Get the document
>             Document doc = iSearcher.doc(scoreDoc.doc);
>             ProgressSearchEngine.debug("File Name:: "
>                     + doc.get(FIELD_FILE_PATH));
>             //Get the terms of that document
>             Terms termsVector = iReader.getTermVector(scoreDoc.doc, query
>                     .getTerm().field());
>
>             if (termsVector != null) {
>                 TermsEnum termsEnum = null;
>                 termsEnum = termsVector.iterator(termsEnum);
>                 //seek to the exact position of the matched term
>                 if (termsEnum.seekExact(new
> BytesRef(query.getTerm().text()),
>                         false)) {
>
>                     DocsAndPositionsEnum dpEnum = null;
>                     dpEnum = termsEnum.docsAndPositions(null, dpEnum);
>
>                     if (dpEnum != null) {
>
>                          if (dpEnum.nextDoc() == 0) { // you need to call
> nextDoc() to have the enum positioned
>
>                              int freq = dpEnum.freq();
>
>
>                                 for(int i=0;i < freq; ++i){
>                                     int position = dpEnum.nextPosition();
>                                     if(position != -1){
>                                         String filePath =
> doc.get(FIELD_FILE_PATH);
>                                         System.out.println("file path
> "+filePath);
>                                         System.out.println("Start offset "
>                                                 + dpEnum.startOffset() + "
> End offset "
>                                                 + dpEnum.endOffset());
>
>                                     }
>                                 }
>
>                          }else{
>
>                              ProgressSearchEngine.debug(
>                                         "Not able to find the offsets for
> the file: "+ doc.get(FIELD_FILE_PATH));
>
>                          }
>
>
>                     }
>                 }
>             }
>
>         }
>     }
>
> Can someone please help me how to get all the attributes that we set in the
> incrementToken method.
>
> And can we add our own attribute apart from already available ones? if yes
> how?
>
> TIA,
> Nischal Y

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Retrieving attributes of terms in lucene

Reply via email to