Dear list, I'm studying the Lucene index file formats and I wonder: after having initialized a field with Field(String name, String value, Field.Store store, Field.Index index), where is the value String stored?
I understand that the chosen analyzer does its processing on that value, including tokenization, and returns a TokenStream from which the Indexer retrieves the attributes that it stores in the index. When I use a binary editor to inspect the term infos (tis) file in the index directory, I can see every single token (term). For experimenting purposes, I implemented an analyzer that converts the value input to the field and noticed the following: the TokenStream still correctly generates the terms that end up to be stored in the tis file, but the initial input value is still displayed as the field value when I retrieve a document from the index and output it with Document.toString(). I tried to analyse the Field's tokenStream, but tokenStreamValue() returns null; is that normal when retrieving a document from an existing index? Can someone let me know what happens to a Field's value string and at which point in the pipeline it is replaced by the (term) attributes generated by the TokenStream? Thank you very much! Best, Carsten -- Carsten Schnober Institut für Deutsche Sprache | http://www.ids-mannheim.de Projekt KorAP -- Korpusanalyseplattform der nächsten Generation http://korap.ids-mannheim.de/ | Tel.: +49-(0)621-1581-238 --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org