1. Correct. 2. Also correct. The analysis chain only affects how the terms are indexed and placed in the inverted index. The original document remains as is. 3. Not sure since I have never done highlighting. Highlighting might not depend on the source since the term positions/offsets are used, but hopefully someone will correct me.
-- Ivan On Wed, Aug 6, 2014 at 11:45 AM, IronMike <[email protected]> wrote: > I searched this topic but some of the answers were still vague to me. > > My goal is to index html docs but have the html stripped for the indexing, > at the same time, I would like _source to have the original html document > for display purposes. > > //My doc format: > { > content: <html> Hello this is an html <b>content</b> ....</html> > rank:1 > date:2014-8-8 > title: Some title > .... > } > > The questions that I am still not very clear on: > > 1 - if I understand correctly, I can push html doc like it is to Index, > and it will strip html provided I do the charfilter referenced here? > > http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-htmlstrip-charfilter.html > > 2- Will the stripping not affect the _source? In other words, _source will > still have the original html? > > 3- Highlighting comes from the _source? this means highlighting will have > html, meaning I will have to strip any html tags after the search comes > back? > > > Thanks > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/6be77d25-f7fe-4a35-a247-932f93f07150%40googlegroups.com > <https://groups.google.com/d/msgid/elasticsearch/6be77d25-f7fe-4a35-a247-932f93f07150%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBfhWBqtfi0zfPvmYs9ytT-bz75U8vCsuuUo3GVvLugpA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
