Re: html stripped highlighted text from html Content field

Alexander Reelsen Fri, 20 Dec 2013 03:20:08 -0800

Hey,

you could use the analyze API and the char_filter to get the extract text
back in parts, see https://gist.github.com/clintongormley/780895
However elasticsearch does not store the text without the HTML somewhere as
a complete block, which you could read out. If you want to do that, you
would need to do it before indexing.


The char_filter is basically to make sure that a search for 'title' will
not include any web page which contains a '<title>' tag.

Not a hundred percent sure if this was your question, so feel free to ask
further and where I might have misunderstood you.


--Alex


On Thu, Dec 19, 2013 at 8:55 PM, Adolfo Rodriguez <[email protected]>wrote:

> Hi, I searched documentation and internet but could not find any accurate
> information on this.
>
> I have a highlight query which is working properly:
>
> SearchResponse response = getClient().prepareSearch()
> .setIndices("myindex")
> .setTypes("mytype")
> .setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
> .setQuery(QueryBuilders
> .boolQuery()
>      .should(QueryBuilders.matchQuery("myfield", "house"))
>        )
> .addHighlightedField("myfield", 250, 1)
> .setFrom(0)
> .setSize(25)
> .execute()
> .actionGet();
>
> The query is fetching results from myfield which contains indexed HTML
> content. Highlighted result contains HTML tags and would like to trip out
> the HTML content response. I found the HTML Strip Char 
> Filter<http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-htmlstrip-charfilter.html>
>  but
> do not know what is the syntax to add it as a request analyzer *in Java*.
>
> I have found examples in Java to create indices including the 
> analyzer<http://jaibeermalik.wordpress.com/2013/03/26/elasticsearch-text-analysis-for-content-enrichment/>
>  but
> none to include the analyzer in a java request which documentations 
> says<http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis.html>is
>  possible:
>
> *The index analysis module acts as a configurable registry of Analyzers
> that can be used in order to both break indexed (analyzed) fields when a
> document is indexed and process query strings*
>
>
> Any pointer to an example would be very appreciated.
>
> Thanks.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/d83716b0-1461-4796-9d03-b7d7cb268ef7%40googlegroups.com
> .
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM840fyahaGQhXQR0nfWf0Y9z8kSXEQJbVETi6rb6R5tdg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: html stripped highlighted text from html Content field

Reply via email to