Guan,
I hardly grasp the particular obstacle. But I don't think that the task is
out of reach overall. Can you share a test case formally describing the
desired behavior?

On Tue, Apr 25, 2023 at 12:29 AM Wang, Guan <wan...@med.umich.edu> wrote:

> Hi Mikhail,
>
> Thank you for introducing abstract class ConditionalTokenFilter to me!
> Took a quick look, it's a wrapper of the upperstream TokenStream with
> conditional rendition.
>
> So, if I have a document like:
>
> HEADER
> TEXT
> TEXT
>
> Implementing ConditionalToeknFilter could only tokenize line 2 and 3.
> However, all 3 lines would still be stored in the field if index=true and
> stored=true...
>
> I wonder if I could only store line 2 and 3 in the field in such a
> scenario?
>
> Many thanks,
>
> Guan
>
> -----Original Message-----
> From: Mikhail Khludnev <m...@apache.org>
> Sent: Monday, April 24, 2023 4:56 PM
> To: java-user@lucene.apache.org
> Subject: Re: Can an analyzer access other field's data during index time?
>
> External Email - Use Caution
>
> Well.. maybe something like
>
> https://lucene.apache.org/core/8_5_1/analyzers-common/org/apache/lucene/analysis/miscellaneous/ConditionalTokenFilter.html
> ?
>
> On Mon, Apr 24, 2023 at 11:40 PM Wang, Guan <wan...@med.umich.edu> wrote:
>
> > Hi Mikhail,
> >
> > Thank you for the definitive answer!
> >
> > I could "solve" this by adding a header in the document with proper
> > information to guide the indexing process. Header will be parsed then
> > ignored by the tokenizer. However, the header along with the actual
> > text will be stored together in that field...
> >
> > I wonder (again...) if it's possible I may control which part of the
> > text shall be stored during the index process? In other words, is it
> > possible to strip the header when storing the text into the field?
> >
> > Best regards,
> >
> > Guan
> >
> > -----Original Message-----
> > From: Mikhail Khludnev <m...@apache.org>
> > Sent: Monday, April 24, 2023 4:20 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: Can an analyzer access other field's data during index time?
> >
> > External Email - Use Caution
> >
> > Hello Guan.
> > It reminds me https://youtu.be/EkkzSLstSAE?t=1531 timecode.
> > I'm afraid it's quite far from the existing codebase where the Field
> > has no reference to enclosing Document. sigh.
> >
> >
> > On Mon, Apr 24, 2023 at 6:00 PM Wang, Guan <wan...@med.umich.edu> wrote:
> >
> > > Hi,
> > >
> > > I understand Lucene analyzer is per field basis. But I wonder if
> > > it's even possible for an analyzer on field A to be able to access
> > > data in field B during the index process on any stage, saying
> > > CharFilter, Tokenizer or TokenFilter?
> > >
> > > I'd like to control the behavior of the indexing process for field A
> > > based upon the value in field B.
> > >
> > > Mighty Lucene community, please let me know if this is doable...
> > >
> > > Many thanks,
> > >
> > > Guan
> > > **********************************************************
> > > Electronic Mail is not secure, may not be read every day, and should
> > > not be used for urgent or sensitive issues
> > >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> > https://t.me/
> > %2FMUST_SEARCH&data=05%7C01%7Cwanggu%40med.umich.edu%7C0bea50f222a14e4
> > b2ca708db450683cc%7C1f41d613d3a14ead918d2a25b10de330%7C0%7C0%7C6381796
> > 66504341414%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMz
> > IiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=2Hy%2B1tCxYQ7ID
> > Ewa36%2ByOl5Jfe284fj4%2B0tutGWOvsk%3D&reserved=0
> > A caveat: Cyrillic!
> > **********************************************************
> > Electronic Mail is not secure, may not be read every day, and should
> > not be used for urgent or sensitive issues
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> https://t.me/MUST_SEARCH
> A caveat: Cyrillic!
> **********************************************************
> Electronic Mail is not secure, may not be read every day, and should not
> be used for urgent or sensitive issues
>


-- 
Sincerely yours
Mikhail Khludnev
https://t.me/MUST_SEARCH
A caveat: Cyrillic!

Reply via email to