Hi, You don't need a second field name, but you can once add the indexed field with stored=false and then add a second instance with same field name and the original stored content, but not indexed. If you want to have docvalues, the same can be done for docvalues. Internally, Lucene does it like that anyways. Adding a field to store and index at same time is just for convenience.
Uwe Am May 9, 2018 5:57:40 AM UTC schrieb "Pachzelt, Adrian" <a.pachz...@ub.uni-frankfurt.de>: >Dear all, > >currently I am reading text fields that contain xml text. Hence, the >solr input may look like this: > ><field name=”tagged_text”><sec sec-type="Introduction" >id="SECID0E4F"> ><title>Introduction</title> ></sec> ></field> > >With all “<” and “>” escaped. >I wrote a tokenizer that indexes the tag attributes (e.g. >sec-type=”Introduction”) on the position of the tagged word >(“Introduction” in this case) and hence I need the HTML tags when >indexing. However, I want to strip the HTML in the stored string that >is shown to the user on a query. So far, I figured out that the index >and the stored string a separated. Thus, I thought it should be >possible to manipulate the stored string either after indexing. > >Is there a way to do so? I would prefer to manipulate the stored string >and not introduce a second field with the plain text in the input file. > >I am glad for any help! > >Best Regards, > >Adrian > >------------------------------------------------------- >Adrian Pachzelt >- Fachinformationsdienst Biodiversitaetsforschung - >- Hosting von Open Access-Zeitschriften - >Universitaetsbibliothek Johann Christian Senckenberg >Bockenheimer Landstr. 134-138 >60325 Frankfurt am Main >Tel. 069/798-39382 >a.pachz...@ub.uni-frankfurt.de<mailto:a.pachz...@ub.uni-frankfurt.de> >------------------------------------------------------- -- Uwe Schindler Achterdiek 19, 28357 Bremen https://www.thetaphi.de