Ivan, The search results I am showing is for the field "title" not for the source. I thought I could query the field not the source and look at it with no html while the source was intact. Did I misunderstand?
On Friday, August 8, 2014 12:36:16 PM UTC-4, Ivan Brusic wrote: > > The analyzers control how text is parsed/tokenized and how terms are > indexed in the inverted index. The source document remains untouched. > > -- > Ivan > > > On Fri, Aug 8, 2014 at 9:24 AM, IronMike <[email protected] <javascript:> > > wrote: > >> I also used Clint's example and tried to map it to a document and search >> the field, but still getting html in query results... Here is my code. I >> appreciate the help. >> >> //Tokenizer >> >> PUT /foo/ >> { >> "settings": { >> "index" : { >> "analysis" : { >> "analyzer" : { >> "test_1" : { >> "char_filter" : [ >> "html_strip" >> ], >> "tokenizer" : "standard" >> } >> } >> } >> } >> } >> } >> >> >> //Mapping >> PUT /foo/foo_type/_mapping >> { >> "foo_type":{ >> "properties" : { >> "title": { >> "type":"string", >> "index": "analyzed", >> "analyzer":"test_1" >> } >> } >> } >> } >> >> >> Get /foo/foo_type/_mapping >> { >> "foo": { >> "mappings": { >> "foo_type": { >> "properties": { >> "date": { >> "type": "date", >> "format": "dateOptionalTime" >> }, >> "title": { >> "type": "string", >> "analyzer": "test_1" >> } >> } >> } >> } >> } >> } >> >> >> ////Index///////////// >> PUT /foo/foo_type/1 >> { >> "date" : "2009-11-15T14:12:12", >> "title" : "The quick & <b>brown</b> fox" >> } >> >> >> //Search ////////// >> GET /foo/_search?pretty:true >> { >> "fields": ["title"], >> "query": { >> "query_string": { >> "query": "brown", >> "analyzer": "test_1" >> } >> } >> } >> >> >> //Results showing html tags still////// >> "hits": [ >> { >> "_index": "foo", >> "_type": "foo_type", >> "_id": "1", >> "_score": 0.076713204, >> "fields": { >> "title": [ >> "The quick & <b>brown</b> fox" >> ] >> } >> >> >> >> On Thursday, August 7, 2014 6:06:56 PM UTC-4, Jörg Prante wrote: >> >>> Have you checked Clint's example? >>> >>> https://gist.github.com/clintongormley/780895 >>> >>> Jörg >>> >>> >>> On Thu, Aug 7, 2014 at 8:23 PM, IronMike <[email protected]> wrote: >>> >>>> I would like to strip html tags for indexing. Here is a simple >>>> example I tried so far, but doesn't seem to strip html tags. Any ideas >>>> what's missing? >>>> >>>> //settings & Mappings >>>> POST twitter >>>> { >>>> "mappings": { >>>> "tweet" : { >>>> "properties" : { >>>> "message" : { >>>> "type" : "string", >>>> "analyzer": "strip_html_analyzer" >>>> }, >>>> "date" : { >>>> "type" : "date" >>>> }, >>>> "name" : { >>>> "type" : "string" >>>> } >>>> } >>>> } >>>> }, >>>> "settings": { >>>> "analysis": { >>>> "analyzer": { >>>> "strip_html_analyzer":{ >>>> "type":"custom", >>>> "tokenizer":"standard", >>>> "filter":"standard", >>>> "char_filter":"my_html" >>>> } >>>> }, >>>> "char_filter": { >>>> "my_html":{ >>>> "type":"html_strip" >>>> } >>>> } >>>> } >>>> } >>>> } >>>> >>>> >>>> //Index a document >>>> PUT /twitter/tweet/1 >>>> { >>>> "name" : "mike", >>>> "date" : "2009-11-15T14:12:12", >>>> "message" : "<html>trying out <b>Elasticsearch</b>, This is an html >>>> test</html>" >>>> } >>>> >>>> >>>> //query result for "html", I expect the query to return nothing since >>>> it is supposed to strip the tag? >>>> "hits": { >>>> "total": 1, >>>> "max_score": 0.11626227, >>>> "hits": [ >>>> { >>>> "_index": "twitter", >>>> "_type": "tweet", >>>> "_id": "1", >>>> "_score": 0.11626227, >>>> "fields": { >>>> "message": [ >>>> "<html>trying out <b>Elasticsearch</b>, This is an >>>> html test</html>" >>>> ] >>>> }, >>>> "highlight": { >>>> "message": [ >>>> "<html>trying out <b>Elasticsearch</b>, This is an >>>> <em>html</em> test</html>" >>>> ] >>>> } >>>> } >>>> ] >>>> } >>>> >>>> >>>> >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "elasticsearch" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> >>>> To view this discussion on the web visit https://groups.google.com/d/ >>>> msgid/elasticsearch/517fe8b8-0b38-4646-bc8f-a27896454515% >>>> 40googlegroups.com >>>> <https://groups.google.com/d/msgid/elasticsearch/517fe8b8-0b38-4646-bc8f-a27896454515%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/a831f6f4-b47c-4c35-a40b-058e3c1b1043%40googlegroups.com >> >> <https://groups.google.com/d/msgid/elasticsearch/a831f6f4-b47c-4c35-a40b-058e3c1b1043%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ffecae0a-0d08-4a76-9717-dee201794be4%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
