Hi Mikhail, Thank you for introducing abstract class ConditionalTokenFilter to me! Took a quick look, it's a wrapper of the upperstream TokenStream with conditional rendition.
So, if I have a document like: HEADER TEXT TEXT Implementing ConditionalToeknFilter could only tokenize line 2 and 3. However, all 3 lines would still be stored in the field if index=true and stored=true... I wonder if I could only store line 2 and 3 in the field in such a scenario? Many thanks, Guan -----Original Message----- From: Mikhail Khludnev <m...@apache.org> Sent: Monday, April 24, 2023 4:56 PM To: java-user@lucene.apache.org Subject: Re: Can an analyzer access other field's data during index time? External Email - Use Caution Well.. maybe something like https://lucene.apache.org/core/8_5_1/analyzers-common/org/apache/lucene/analysis/miscellaneous/ConditionalTokenFilter.html ? On Mon, Apr 24, 2023 at 11:40 PM Wang, Guan <wan...@med.umich.edu> wrote: > Hi Mikhail, > > Thank you for the definitive answer! > > I could "solve" this by adding a header in the document with proper > information to guide the indexing process. Header will be parsed then > ignored by the tokenizer. However, the header along with the actual > text will be stored together in that field... > > I wonder (again...) if it's possible I may control which part of the > text shall be stored during the index process? In other words, is it > possible to strip the header when storing the text into the field? > > Best regards, > > Guan > > -----Original Message----- > From: Mikhail Khludnev <m...@apache.org> > Sent: Monday, April 24, 2023 4:20 PM > To: java-user@lucene.apache.org > Subject: Re: Can an analyzer access other field's data during index time? > > External Email - Use Caution > > Hello Guan. > It reminds me https://youtu.be/EkkzSLstSAE?t=1531 timecode. > I'm afraid it's quite far from the existing codebase where the Field > has no reference to enclosing Document. sigh. > > > On Mon, Apr 24, 2023 at 6:00 PM Wang, Guan <wan...@med.umich.edu> wrote: > > > Hi, > > > > I understand Lucene analyzer is per field basis. But I wonder if > > it's even possible for an analyzer on field A to be able to access > > data in field B during the index process on any stage, saying > > CharFilter, Tokenizer or TokenFilter? > > > > I'd like to control the behavior of the indexing process for field A > > based upon the value in field B. > > > > Mighty Lucene community, please let me know if this is doable... > > > > Many thanks, > > > > Guan > > ********************************************************** > > Electronic Mail is not secure, may not be read every day, and should > > not be used for urgent or sensitive issues > > > > > -- > Sincerely yours > Mikhail Khludnev > https://t.me/ > %2FMUST_SEARCH&data=05%7C01%7Cwanggu%40med.umich.edu%7C0bea50f222a14e4 > b2ca708db450683cc%7C1f41d613d3a14ead918d2a25b10de330%7C0%7C0%7C6381796 > 66504341414%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMz > IiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=2Hy%2B1tCxYQ7ID > Ewa36%2ByOl5Jfe284fj4%2B0tutGWOvsk%3D&reserved=0 > A caveat: Cyrillic! > ********************************************************** > Electronic Mail is not secure, may not be read every day, and should > not be used for urgent or sensitive issues > -- Sincerely yours Mikhail Khludnev https://t.me/MUST_SEARCH A caveat: Cyrillic! ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues