Hey, you could use the analyze API and the char_filter to get the extract text back in parts, see https://gist.github.com/clintongormley/780895 However elasticsearch does not store the text without the HTML somewhere as a complete block, which you could read out. If you want to do that, you would need to do it before indexing.
The char_filter is basically to make sure that a search for 'title' will not include any web page which contains a '<title>' tag. Not a hundred percent sure if this was your question, so feel free to ask further and where I might have misunderstood you. --Alex On Thu, Dec 19, 2013 at 8:55 PM, Adolfo Rodriguez <[email protected]>wrote: > Hi, I searched documentation and internet but could not find any accurate > information on this. > > I have a highlight query which is working properly: > > SearchResponse response = getClient().prepareSearch() > .setIndices("myindex") > .setTypes("mytype") > .setSearchType(SearchType.DFS_QUERY_THEN_FETCH) > .setQuery(QueryBuilders > .boolQuery() > .should(QueryBuilders.matchQuery("myfield", "house")) > ) > .addHighlightedField("myfield", 250, 1) > .setFrom(0) > .setSize(25) > .execute() > .actionGet(); > > The query is fetching results from myfield which contains indexed HTML > content. Highlighted result contains HTML tags and would like to trip out > the HTML content response. I found the HTML Strip Char > Filter<http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-htmlstrip-charfilter.html> > but > do not know what is the syntax to add it as a request analyzer *in Java*. > > I have found examples in Java to create indices including the > analyzer<http://jaibeermalik.wordpress.com/2013/03/26/elasticsearch-text-analysis-for-content-enrichment/> > but > none to include the analyzer in a java request which documentations > says<http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis.html>is > possible: > > *The index analysis module acts as a configurable registry of Analyzers > that can be used in order to both break indexed (analyzed) fields when a > document is indexed and process query strings* > > > Any pointer to an example would be very appreciated. > > Thanks. > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/d83716b0-1461-4796-9d03-b7d7cb268ef7%40googlegroups.com > . > For more options, visit https://groups.google.com/groups/opt_out. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM840fyahaGQhXQR0nfWf0Y9z8kSXEQJbVETi6rb6R5tdg%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
