Hi Usman, Long ago Lucene switched to reusing these analysis components (per Analyzer, per thread), so that explains why createComponents is called once.
However, the reuse policy is controllable (expert usage), so in theory you could implement an Analyzer.ReuseStrategy that never reuses and pass that to super() when you create your custom Analyzer. However, that is generally not a great idea in general -- poor indexing throughput. Another possibility is to create a Field with a pre-analyzed TokenStream, basically bypassing Analyzer entirely and making your own TokenStream chain that will alter these payload values. Usually payloads are set/derived from the incoming tokens and would not be dynamically set externally. Or, such a parameter that changes per document but not per token could be set in a doc values field instead. Mike McCandless http://blog.mikemccandless.com On Thu, Jun 8, 2023 at 7:08 AM Usman Shaikh <shai...@gmail.com> wrote: > Hello > > I hope somebody can offer suggestions/advice regarding this. > > I'm going through some old Lucene code and have a custom Analyzer which > overrides the createComponents method. See snippet below > > public class BulletinPayloadsAnalyzer extends Analyzer { > private boolean bulletin; > private float boost; > > BulletinPayloadsAnalyzer(float boost) { > this.boost = boost; > } > > public void setBulletin(boolean bulletin) { > this.bulletin = bulletin; > } > > @Override > protected TokenStreamComponents createComponents(String fieldName) { > Tokenizer src = new StandardTokenizer(); > BulletinPayloadsFilter result = new BulletinPayloadsFilter(src, boost, > bulletin); > return new TokenStreamComponents(src, result); > } > > I then use the boost and bulletin params inside my BullletinPayloadsFilter > for some specialized logic e.g. if bulletin is true, and a keyword is > tokenized, then boost the document by setting a PayloadAttribute with the > boost amount. > However I've noticed when indexing several documents at once, the > createComponents method is only called the first time. For all subsequent > documents execution goes straight into the incrementToken method of my > custom BulletinPayloadsFilter. > > Is there a way of ensuring the createComponents method is called when > indexing each document? As I need to make sure the correct parameters are > passed to the filter. These params could change for each document. > > Thank you > Usman >