Hi Mikhail,

Thank you for the definitive answer!

I could "solve" this by adding a header in the document with proper information 
to guide the indexing process. Header will be parsed then ignored by the 
tokenizer. However, the header along with the actual text will be stored 
together in that field...

I wonder (again...) if it's possible I may control which part of the text shall 
be stored during the index process? In other words, is it possible to strip the 
header when storing the text into the field?

Best regards,

Guan

-----Original Message-----
From: Mikhail Khludnev <m...@apache.org>
Sent: Monday, April 24, 2023 4:20 PM
To: java-user@lucene.apache.org
Subject: Re: Can an analyzer access other field's data during index time?

External Email - Use Caution

Hello Guan.
It reminds me https://youtu.be/EkkzSLstSAE?t=1531 timecode.
I'm afraid it's quite far from the existing codebase where the Field has no 
reference to enclosing Document. sigh.


On Mon, Apr 24, 2023 at 6:00 PM Wang, Guan <wan...@med.umich.edu> wrote:

> Hi,
>
> I understand Lucene analyzer is per field basis. But I wonder if it's
> even possible for an analyzer on field A to be able to access data in
> field B during the index process on any stage, saying CharFilter,
> Tokenizer or TokenFilter?
>
> I'd like to control the behavior of the indexing process for field A
> based upon the value in field B.
>
> Mighty Lucene community, please let me know if this is doable...
>
> Many thanks,
>
> Guan
> **********************************************************
> Electronic Mail is not secure, may not be read every day, and should
> not be used for urgent or sensitive issues
>


--
Sincerely yours
Mikhail Khludnev
https://t.me/MUST_SEARCH
A caveat: Cyrillic!
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be 
used for urgent or sensitive issues 

Reply via email to