Well.. maybe something like https://lucene.apache.org/core/8_5_1/analyzers-common/org/apache/lucene/analysis/miscellaneous/ConditionalTokenFilter.html ?
On Mon, Apr 24, 2023 at 11:40 PM Wang, Guan <wan...@med.umich.edu> wrote: > Hi Mikhail, > > Thank you for the definitive answer! > > I could "solve" this by adding a header in the document with proper > information to guide the indexing process. Header will be parsed then > ignored by the tokenizer. However, the header along with the actual text > will be stored together in that field... > > I wonder (again...) if it's possible I may control which part of the text > shall be stored during the index process? In other words, is it possible to > strip the header when storing the text into the field? > > Best regards, > > Guan > > -----Original Message----- > From: Mikhail Khludnev <m...@apache.org> > Sent: Monday, April 24, 2023 4:20 PM > To: java-user@lucene.apache.org > Subject: Re: Can an analyzer access other field's data during index time? > > External Email - Use Caution > > Hello Guan. > It reminds me https://youtu.be/EkkzSLstSAE?t=1531 timecode. > I'm afraid it's quite far from the existing codebase where the Field has > no reference to enclosing Document. sigh. > > > On Mon, Apr 24, 2023 at 6:00 PM Wang, Guan <wan...@med.umich.edu> wrote: > > > Hi, > > > > I understand Lucene analyzer is per field basis. But I wonder if it's > > even possible for an analyzer on field A to be able to access data in > > field B during the index process on any stage, saying CharFilter, > > Tokenizer or TokenFilter? > > > > I'd like to control the behavior of the indexing process for field A > > based upon the value in field B. > > > > Mighty Lucene community, please let me know if this is doable... > > > > Many thanks, > > > > Guan > > ********************************************************** > > Electronic Mail is not secure, may not be read every day, and should > > not be used for urgent or sensitive issues > > > > > -- > Sincerely yours > Mikhail Khludnev > https://t.me/MUST_SEARCH > A caveat: Cyrillic! > ********************************************************** > Electronic Mail is not secure, may not be read every day, and should not > be used for urgent or sensitive issues > -- Sincerely yours Mikhail Khludnev https://t.me/MUST_SEARCH A caveat: Cyrillic!