Manipulate stored string in Lucene

Pachzelt, Adrian Tue, 08 May 2018 22:58:30 -0700

Dear all,

currently I am reading text fields that contain xml text. Hence, the solr input 
may look like this:


<field name=”tagged_text”>&lt;sec sec-type="Introduction" id="SECID0E4F"&gt;
&lt;title&gt;Introduction&lt;/title&gt;
&lt;/sec&gt;
</field>

With all “<” and “>” escaped.
I wrote a tokenizer that indexes the tag attributes (e.g. 
sec-type=”Introduction”) on the position of the tagged word (“Introduction” in 
this case) and hence I need the HTML tags when indexing. However, I want to 
strip the HTML in the stored string that is shown to the user on a query. So 
far, I figured out that the index and the stored string a separated. Thus, I 
thought it should be possible to manipulate the stored string either after 
indexing.

Is there a way to do so? I would prefer to manipulate the stored string and not 
introduce a second field with the plain text in the input file.

I am glad for any help!

Best Regards,

Adrian

-------------------------------------------------------
Adrian Pachzelt
- Fachinformationsdienst Biodiversitaetsforschung -
- Hosting von Open Access-Zeitschriften -
Universitaetsbibliothek Johann Christian Senckenberg
Bockenheimer Landstr. 134-138
60325 Frankfurt am Main
Tel. 069/798-39382
a.pachz...@ub.uni-frankfurt.de<mailto:a.pachz...@ub.uni-frankfurt.de>
-------------------------------------------------------

Manipulate stored string in Lucene

Reply via email to