1. Correct.
2. Also correct. The analysis chain only affects how the terms are indexed
and placed in the inverted index. The original document remains as is.
3. Not sure since I have never done highlighting. Highlighting might not
depend on the source since the term positions/offsets are used, but
hopefully someone will correct me.

-- 
Ivan


On Wed, Aug 6, 2014 at 11:45 AM, IronMike <[email protected]> wrote:

> I searched this topic but some of the answers were still vague to me.
>
> My goal is to index html docs but have the html stripped for the indexing,
> at the same time, I would like _source to have the original html document
> for display purposes.
>
> //My doc format:
> {
>   content: <html> Hello this is an html <b>content</b> ....</html>
>   rank:1
>   date:2014-8-8
>   title: Some title
>   ....
> }
>
> The questions that I am still not very clear on:
>
> 1 - if I understand correctly, I can push html doc like it is to Index,
> and it will strip html provided I do the charfilter referenced here?
>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-htmlstrip-charfilter.html
>
> 2- Will the stripping not affect the _source? In other words, _source will
> still have the original html?
>
> 3- Highlighting comes from the _source? this means highlighting will have
> html, meaning I will have to strip any html tags after the search comes
> back?
>
>
> Thanks
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/6be77d25-f7fe-4a35-a247-932f93f07150%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/6be77d25-f7fe-4a35-a247-932f93f07150%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBfhWBqtfi0zfPvmYs9ytT-bz75U8vCsuuUo3GVvLugpA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to