I will check this out! Thank you, Mikhail! :) ------------------------------------------------------- Adrian Pachzelt - Fachinformationsdienst Biodiversitaetsforschung - - Hosting von Open Access-Zeitschriften - Universitaetsbibliothek Johann Christian Senckenberg Bockenheimer Landstr. 134-138 60325 Frankfurt am Main Tel. 069/798-39382 a.pachz...@ub.uni-frankfurt.de -------------------------------------------------------
-----Ursprüngliche Nachricht----- Von: Mikhail Khludnev [mailto:m...@apache.org] Gesendet: Mittwoch, 9. Mai 2018 11:15 An: general@lucene.apache.org Betreff: Re: Manipulate stored string in Lucene Hello, Adrien. If I got you right, it's an UpdateRequestProcessor's duty see https://lucene.apache.org/solr/guide/7_3/update-request-processors.html On Wed, May 9, 2018 at 11:39 AM, Pachzelt, Adrian < a.pachz...@ub.uni-frankfurt.de> wrote: > Hi Uwe, > > thanks for the advice. Yes, I use Solr overall, but thought it would be a > Lucene issue. > > Previously, I followed your proposed solution. I set the original field as > stored=false indexed=true, created a copyfield, and in the copied field set > stored=true indexed=false. However, I do not know how to manipulate the > stored string in the copyField. Do you have an idea? > > Thanks a lot! :) > > Adrian > > ------------------------------------------------------- > Adrian Pachzelt > - Fachinformationsdienst Biodiversitaetsforschung - > - Hosting von Open Access-Zeitschriften - > Universitaetsbibliothek Johann Christian Senckenberg > Bockenheimer Landstr. 134-138 > 60325 Frankfurt am Main > Tel. 069/798-39382 > a.pachz...@ub.uni-frankfurt.de > ------------------------------------------------------- > > > -----Ursprüngliche Nachricht----- > Von: Uwe Schindler [mailto:u...@thetaphi.de] > Gesendet: Mittwoch, 9. Mai 2018 08:11 > An: general@lucene.apache.org > Betreff: Re: Manipulate stored string in Lucene > > Oh it's Solr? Then it's not easy possible. Plain Lucene works like that. > > Uwe > > Am May 9, 2018 6:09:42 AM UTC schrieb Uwe Schindler <u...@thetaphi.de>: > >Hi, > > > >You don't need a second field name, but you can once add the indexed > >field with stored=false and then add a second instance with same field > >name and the original stored content, but not indexed. If you want to > >have docvalues, the same can be done for docvalues. Internally, Lucene > >does it like that anyways. Adding a field to store and index at same > >time is just for convenience. > > > >Uwe > > > >Am May 9, 2018 5:57:40 AM UTC schrieb "Pachzelt, Adrian" > ><a.pachz...@ub.uni-frankfurt.de>: > >>Dear all, > >> > >>currently I am reading text fields that contain xml text. Hence, the > >>solr input may look like this: > >> > >><field name=”tagged_text”><sec sec-type="Introduction" > >>id="SECID0E4F"> > >><title>Introduction</title> > >></sec> > >></field> > >> > >>With all “<” and “>” escaped. > >>I wrote a tokenizer that indexes the tag attributes (e.g. > >>sec-type=”Introduction”) on the position of the tagged word > >>(“Introduction” in this case) and hence I need the HTML tags when > >>indexing. However, I want to strip the HTML in the stored string that > >>is shown to the user on a query. So far, I figured out that the index > >>and the stored string a separated. Thus, I thought it should be > >>possible to manipulate the stored string either after indexing. > >> > >>Is there a way to do so? I would prefer to manipulate the stored > >string > >>and not introduce a second field with the plain text in the input > >file. > >> > >>I am glad for any help! > >> > >>Best Regards, > >> > >>Adrian > >> > >>------------------------------------------------------- > >>Adrian Pachzelt > >>- Fachinformationsdienst Biodiversitaetsforschung - > >>- Hosting von Open Access-Zeitschriften - > >>Universitaetsbibliothek Johann Christian Senckenberg > >>Bockenheimer Landstr. 134-138 > >>60325 Frankfurt am Main > >>Tel. 069/798-39382 > >>a.pachz...@ub.uni-frankfurt.de<mailto:a.pachz...@ub.uni-frankfurt.de> > >>------------------------------------------------------- > > > >-- > >Uwe Schindler > >Achterdiek 19, 28357 Bremen > >https://www.thetaphi.de > > -- > Uwe Schindler > Achterdiek 19, 28357 Bremen > https://www.thetaphi.de > -- Sincerely yours Mikhail Khludnev