RE: Can an analyzer access other field's data during index time?

Wang, Guan Mon, 24 Apr 2023 14:29:40 -0700

Hi Mikhail,

Thank you for introducing abstract class ConditionalTokenFilter to me! Took a 
quick look, it's a wrapper of the upperstream TokenStream with conditional 
rendition.


So, if I have a document like:

HEADER
TEXT
TEXT

Implementing ConditionalToeknFilter could only tokenize line 2 and 3. However, 
all 3 lines would still be stored in the field if index=true and stored=true...

I wonder if I could only store line 2 and 3 in the field in such a scenario?

Many thanks,

Guan

-----Original Message-----
From: Mikhail Khludnev <m...@apache.org>
Sent: Monday, April 24, 2023 4:56 PM
To: java-user@lucene.apache.org
Subject: Re: Can an analyzer access other field's data during index time?

External Email - Use Caution

Well.. maybe something like
https://lucene.apache.org/core/8_5_1/analyzers-common/org/apache/lucene/analysis/miscellaneous/ConditionalTokenFilter.html
?

On Mon, Apr 24, 2023 at 11:40 PM Wang, Guan <wan...@med.umich.edu> wrote:

> Hi Mikhail,
>
> Thank you for the definitive answer!
>
> I could "solve" this by adding a header in the document with proper
> information to guide the indexing process. Header will be parsed then
> ignored by the tokenizer. However, the header along with the actual
> text will be stored together in that field...
>
> I wonder (again...) if it's possible I may control which part of the
> text shall be stored during the index process? In other words, is it
> possible to strip the header when storing the text into the field?
>
> Best regards,
>
> Guan
>
> -----Original Message-----
> From: Mikhail Khludnev <m...@apache.org>
> Sent: Monday, April 24, 2023 4:20 PM
> To: java-user@lucene.apache.org
> Subject: Re: Can an analyzer access other field's data during index time?
>
> External Email - Use Caution
>
> Hello Guan.
> It reminds me https://youtu.be/EkkzSLstSAE?t=1531 timecode.
> I'm afraid it's quite far from the existing codebase where the Field
> has no reference to enclosing Document. sigh.
>
>
> On Mon, Apr 24, 2023 at 6:00 PM Wang, Guan <wan...@med.umich.edu> wrote:
>
> > Hi,
> >
> > I understand Lucene analyzer is per field basis. But I wonder if
> > it's even possible for an analyzer on field A to be able to access
> > data in field B during the index process on any stage, saying
> > CharFilter, Tokenizer or TokenFilter?
> >
> > I'd like to control the behavior of the indexing process for field A
> > based upon the value in field B.
> >
> > Mighty Lucene community, please let me know if this is doable...
> >
> > Many thanks,
> >
> > Guan
> > **********************************************************
> > Electronic Mail is not secure, may not be read every day, and should
> > not be used for urgent or sensitive issues
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> https://t.me/
> %2FMUST_SEARCH&data=05%7C01%7Cwanggu%40med.umich.edu%7C0bea50f222a14e4
> b2ca708db450683cc%7C1f41d613d3a14ead918d2a25b10de330%7C0%7C0%7C6381796
> 66504341414%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMz
> IiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=2Hy%2B1tCxYQ7ID
> Ewa36%2ByOl5Jfe284fj4%2B0tutGWOvsk%3D&reserved=0
> A caveat: Cyrillic!
> **********************************************************
> Electronic Mail is not secure, may not be read every day, and should
> not be used for urgent or sensitive issues
>


--
Sincerely yours
Mikhail Khludnev
https://t.me/MUST_SEARCH
A caveat: Cyrillic!
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be 
used for urgent or sensitive issues

RE: Can an analyzer access other field's data during index time?

Reply via email to