I will check this out! Thank you, Mikhail! :)

-------------------------------------------------------
Adrian Pachzelt
- Fachinformationsdienst Biodiversitaetsforschung -
- Hosting von Open Access-Zeitschriften -
Universitaetsbibliothek Johann Christian Senckenberg
Bockenheimer Landstr. 134-138
60325 Frankfurt am Main
Tel. 069/798-39382
a.pachz...@ub.uni-frankfurt.de
-------------------------------------------------------


-----Ursprüngliche Nachricht-----
Von: Mikhail Khludnev [mailto:m...@apache.org] 
Gesendet: Mittwoch, 9. Mai 2018 11:15
An: general@lucene.apache.org
Betreff: Re: Manipulate stored string in Lucene

Hello, Adrien.
If I got you right, it's an UpdateRequestProcessor's duty see
https://lucene.apache.org/solr/guide/7_3/update-request-processors.html


On Wed, May 9, 2018 at 11:39 AM, Pachzelt, Adrian <
a.pachz...@ub.uni-frankfurt.de> wrote:

> Hi Uwe,
>
> thanks for the advice. Yes, I use Solr overall, but thought it would be a
> Lucene issue.
>
> Previously, I followed your proposed solution. I set the original field as
> stored=false indexed=true, created a copyfield, and in the copied field set
> stored=true indexed=false. However, I do not know how to manipulate the
> stored string in the copyField. Do you have an idea?
>
> Thanks a lot! :)
>
> Adrian
>
> -------------------------------------------------------
> Adrian Pachzelt
> - Fachinformationsdienst Biodiversitaetsforschung -
> - Hosting von Open Access-Zeitschriften -
> Universitaetsbibliothek Johann Christian Senckenberg
> Bockenheimer Landstr. 134-138
> 60325 Frankfurt am Main
> Tel. 069/798-39382
> a.pachz...@ub.uni-frankfurt.de
> -------------------------------------------------------
>
>
> -----Ursprüngliche Nachricht-----
> Von: Uwe Schindler [mailto:u...@thetaphi.de]
> Gesendet: Mittwoch, 9. Mai 2018 08:11
> An: general@lucene.apache.org
> Betreff: Re: Manipulate stored string in Lucene
>
> Oh it's Solr? Then it's not easy possible. Plain Lucene works like that.
>
> Uwe
>
> Am May 9, 2018 6:09:42 AM UTC schrieb Uwe Schindler <u...@thetaphi.de>:
> >Hi,
> >
> >You don't need a second field name, but you can once add the indexed
> >field with stored=false and then add a second instance with same field
> >name and the original stored content, but not indexed. If you want to
> >have docvalues, the same can be done for docvalues. Internally, Lucene
> >does it like that anyways. Adding a field to store and index at same
> >time is just for convenience.
> >
> >Uwe
> >
> >Am May 9, 2018 5:57:40 AM UTC schrieb "Pachzelt, Adrian"
> ><a.pachz...@ub.uni-frankfurt.de>:
> >>Dear all,
> >>
> >>currently I am reading text fields that contain xml text. Hence, the
> >>solr input may look like this:
> >>
> >><field name=”tagged_text”>&lt;sec sec-type="Introduction"
> >>id="SECID0E4F"&gt;
> >>&lt;title&gt;Introduction&lt;/title&gt;
> >>&lt;/sec&gt;
> >></field>
> >>
> >>With all “<” and “>” escaped.
> >>I wrote a tokenizer that indexes the tag attributes (e.g.
> >>sec-type=”Introduction”) on the position of the tagged word
> >>(“Introduction” in this case) and hence I need the HTML tags when
> >>indexing. However, I want to strip the HTML in the stored string that
> >>is shown to the user on a query. So far, I figured out that the index
> >>and the stored string a separated. Thus, I thought it should be
> >>possible to manipulate the stored string either after indexing.
> >>
> >>Is there a way to do so? I would prefer to manipulate the stored
> >string
> >>and not introduce a second field with the plain text in the input
> >file.
> >>
> >>I am glad for any help!
> >>
> >>Best Regards,
> >>
> >>Adrian
> >>
> >>-------------------------------------------------------
> >>Adrian Pachzelt
> >>- Fachinformationsdienst Biodiversitaetsforschung -
> >>- Hosting von Open Access-Zeitschriften -
> >>Universitaetsbibliothek Johann Christian Senckenberg
> >>Bockenheimer Landstr. 134-138
> >>60325 Frankfurt am Main
> >>Tel. 069/798-39382
> >>a.pachz...@ub.uni-frankfurt.de<mailto:a.pachz...@ub.uni-frankfurt.de>
> >>-------------------------------------------------------
> >
> >--
> >Uwe Schindler
> >Achterdiek 19, 28357 Bremen
> >https://www.thetaphi.de
>
> --
> Uwe Schindler
> Achterdiek 19, 28357 Bremen
> https://www.thetaphi.de
>



-- 
Sincerely yours
Mikhail Khludnev

Reply via email to