Hi,

You don't need a second field name, but you can once add the indexed field with 
stored=false and then add a second instance with same field name and the 
original stored content, but not indexed. If you want to have docvalues, the 
same can be done for docvalues. Internally, Lucene does it like that anyways. 
Adding a field to store and index at same time is just for convenience.

Uwe

Am May 9, 2018 5:57:40 AM UTC schrieb "Pachzelt, Adrian" 
<a.pachz...@ub.uni-frankfurt.de>:
>Dear all,
>
>currently I am reading text fields that contain xml text. Hence, the
>solr input may look like this:
>
><field name=”tagged_text”>&lt;sec sec-type="Introduction"
>id="SECID0E4F"&gt;
>&lt;title&gt;Introduction&lt;/title&gt;
>&lt;/sec&gt;
></field>
>
>With all “<” and “>” escaped.
>I wrote a tokenizer that indexes the tag attributes (e.g.
>sec-type=”Introduction”) on the position of the tagged word
>(“Introduction” in this case) and hence I need the HTML tags when
>indexing. However, I want to strip the HTML in the stored string that
>is shown to the user on a query. So far, I figured out that the index
>and the stored string a separated. Thus, I thought it should be
>possible to manipulate the stored string either after indexing.
>
>Is there a way to do so? I would prefer to manipulate the stored string
>and not introduce a second field with the plain text in the input file.
>
>I am glad for any help!
>
>Best Regards,
>
>Adrian
>
>-------------------------------------------------------
>Adrian Pachzelt
>- Fachinformationsdienst Biodiversitaetsforschung -
>- Hosting von Open Access-Zeitschriften -
>Universitaetsbibliothek Johann Christian Senckenberg
>Bockenheimer Landstr. 134-138
>60325 Frankfurt am Main
>Tel. 069/798-39382
>a.pachz...@ub.uni-frankfurt.de<mailto:a.pachz...@ub.uni-frankfurt.de>
>-------------------------------------------------------

--
Uwe Schindler
Achterdiek 19, 28357 Bremen
https://www.thetaphi.de

Reply via email to