Dear all, currently I am reading text fields that contain xml text. Hence, the solr input may look like this:
<field name=”tagged_text”><sec sec-type="Introduction" id="SECID0E4F"> <title>Introduction</title> </sec> </field> With all “<” and “>” escaped. I wrote a tokenizer that indexes the tag attributes (e.g. sec-type=”Introduction”) on the position of the tagged word (“Introduction” in this case) and hence I need the HTML tags when indexing. However, I want to strip the HTML in the stored string that is shown to the user on a query. So far, I figured out that the index and the stored string a separated. Thus, I thought it should be possible to manipulate the stored string either after indexing. Is there a way to do so? I would prefer to manipulate the stored string and not introduce a second field with the plain text in the input file. I am glad for any help! Best Regards, Adrian ------------------------------------------------------- Adrian Pachzelt - Fachinformationsdienst Biodiversitaetsforschung - - Hosting von Open Access-Zeitschriften - Universitaetsbibliothek Johann Christian Senckenberg Bockenheimer Landstr. 134-138 60325 Frankfurt am Main Tel. 069/798-39382 a.pachz...@ub.uni-frankfurt.de<mailto:a.pachz...@ub.uni-frankfurt.de> -------------------------------------------------------