Well.. maybe something like
https://lucene.apache.org/core/8_5_1/analyzers-common/org/apache/lucene/analysis/miscellaneous/ConditionalTokenFilter.html
?

On Mon, Apr 24, 2023 at 11:40 PM Wang, Guan <wan...@med.umich.edu> wrote:

> Hi Mikhail,
>
> Thank you for the definitive answer!
>
> I could "solve" this by adding a header in the document with proper
> information to guide the indexing process. Header will be parsed then
> ignored by the tokenizer. However, the header along with the actual text
> will be stored together in that field...
>
> I wonder (again...) if it's possible I may control which part of the text
> shall be stored during the index process? In other words, is it possible to
> strip the header when storing the text into the field?
>
> Best regards,
>
> Guan
>
> -----Original Message-----
> From: Mikhail Khludnev <m...@apache.org>
> Sent: Monday, April 24, 2023 4:20 PM
> To: java-user@lucene.apache.org
> Subject: Re: Can an analyzer access other field's data during index time?
>
> External Email - Use Caution
>
> Hello Guan.
> It reminds me https://youtu.be/EkkzSLstSAE?t=1531 timecode.
> I'm afraid it's quite far from the existing codebase where the Field has
> no reference to enclosing Document. sigh.
>
>
> On Mon, Apr 24, 2023 at 6:00 PM Wang, Guan <wan...@med.umich.edu> wrote:
>
> > Hi,
> >
> > I understand Lucene analyzer is per field basis. But I wonder if it's
> > even possible for an analyzer on field A to be able to access data in
> > field B during the index process on any stage, saying CharFilter,
> > Tokenizer or TokenFilter?
> >
> > I'd like to control the behavior of the indexing process for field A
> > based upon the value in field B.
> >
> > Mighty Lucene community, please let me know if this is doable...
> >
> > Many thanks,
> >
> > Guan
> > **********************************************************
> > Electronic Mail is not secure, may not be read every day, and should
> > not be used for urgent or sensitive issues
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> https://t.me/MUST_SEARCH
> A caveat: Cyrillic!
> **********************************************************
> Electronic Mail is not secure, may not be read every day, and should not
> be used for urgent or sensitive issues
>


-- 
Sincerely yours
Mikhail Khludnev
https://t.me/MUST_SEARCH
A caveat: Cyrillic!

Reply via email to