Re: How to get terms of a particular field of a particular document

Michael Wechner Mon, 13 Nov 2023 01:31:00 -0800

I just realize, that the code can be even simpler:

String text ="Apache Lucen is a great search library!"; TokenStream stream = 
TokenSources.getTokenStream(null,null, text,new StandardAnalyzer(), -1);

stream.reset();// INFO: Seehttps://lucene.apache.org/core/9_8_0/core/org/apache/lucene/analysis/TokenStream.htmlwhile (stream.incrementToken()) {

    log.info("Token: " + stream.getAttribute(CharTermAttribute.class));
}
stream.end();
stream.close();


The code also seems to work without stream.end() but if I understand the 
documentation at
https://lucene.apache.org/core/9_8_0/core/org/apache/lucene/analysis/TokenStream.html
 correctly, then one should add it.

Thanks

Michael


Am 12.11.23 um 23:36 schrieb Michael Wechner:

Thanks again, whereas I think I have found now what I wanted (without needing 
the Highlighter):

IndexReader reader = DirectoryReader.open(„index_directory");
log.info("Get terms of document ...");
TokenStream stream = TokenSources.getTokenStream(„field_name", null, text, 
analyzer, -1);
stream.reset();
while (stream.incrementToken()) {
     log.info("Term: " + stream.getAttribute(CharTermAttribute.class));
}
stream.close();
reader.close()

Thanks

Michael

Am 12.11.2023 um 22:00 schrieb Mikhail Khludnev<[email protected]>:

it's something over there
https://github.com/apache/lucene/blob/4e2ce76b3e131ba92b7327a52460e6c4d92c5e33/lucene/highlighter/src/java/org/apache/lucene/search/highlight/Highlighter.java#L159


On Sun, Nov 12, 2023 at 11:42 PM Michael Wechner<[email protected]>
wrote:

Hi Mikhail

Thank you very much for your feedback!

I have found various examples for the first option when running a query,
e.g.

https://howtodoinjava.com/lucene/lucene-search-highlight-example/

but don't understand how to implement the second option, resp. how to
get the extracted terms of a document field independent of a query?

Can you maybe give a code example?

Thanks

Michael



Am 12.11.23 um 18:46 schrieb Mikhail Khludnev:

Hello,
This is what highlighters do. There are two options:
  - index termVectors, obtain them in search time.
  - obtain the stored field value, analyse it again, get all terms.
  Good Luck

On Sun, Nov 12, 2023 at 7:47 PM Michael Wechner <

[email protected]>

wrote:

HI

IIUC I can get all terms of a particular field of an index with

IndexReader reader = DirectoryReader.open(„index_directory");
List<LeafReaderContext> list = reader.leaves();
for (LeafReaderContext lrc : list) {
     Terms terms = lrc.reader().terms(„field_name");
     if (terms != null) {
         TermsEnum termsEnum = terms.iterator();
         BytesRef term = null;
         while ((term = termsEnum.next()) != null) {
             System.out.println("Term: " + term.utf8ToString());
         }
     }
}
reader.close();
But how I can get all terms of a particular field of a particular

document?

Thanks
Michael

P.S.: Btw, does it make sense to update the Lucene FAQ

https://cwiki.apache.org/confluence/display/lucene/lucenefaq#LuceneFAQ-HowdoIretrieveallthevaluesofaparticularfieldthatexistswithinanindex,acrossalldocuments

?
with the code above?
I can do this, but want to make sure, that I don’t update it in a wrong
way.


---------------------------------------------------------------------
To unsubscribe, e-mail:[email protected]
For additional commands, e-mail:[email protected]

--
Sincerely yours
Mikhail Khludnev

Re: How to get terms of a particular field of a particular document

Reply via email to