The field is derived from the source and not generated from the tokens. If we indexed the sentence "The quick brown foxes jumped over the lazy dogs" with the english analyzer, the tokens would be
http://localhost:9200/_analyze?text=The%20quick%20brown%20foxes%20jumped%20over%20the%20lazy%20dogs&analyzer=english quick brown fox jump over lazi dog After applying stopwords and stemming, the tokens do not form a sentence that looks like the original. -- Ivan On Fri, Aug 8, 2014 at 9:42 AM, IronMike <[email protected]> wrote: > Ivan, > > The search results I am showing is for the field "title" not for the > source. I thought I could query the field not the source and look at it > with no html while the source was intact. Did I misunderstand? > > > On Friday, August 8, 2014 12:36:16 PM UTC-4, Ivan Brusic wrote: > >> The analyzers control how text is parsed/tokenized and how terms are >> indexed in the inverted index. The source document remains untouched. >> >> -- >> Ivan >> >> >> On Fri, Aug 8, 2014 at 9:24 AM, IronMike <[email protected]> wrote: >> >>> I also used Clint's example and tried to map it to a document and search >>> the field, but still getting html in query results... Here is my code. I >>> appreciate the help. >>> >>> //Tokenizer >>> >>> PUT /foo/ >>> { >>> "settings": { >>> "index" : { >>> "analysis" : { >>> "analyzer" : { >>> "test_1" : { >>> "char_filter" : [ >>> "html_strip" >>> ], >>> "tokenizer" : "standard" >>> } >>> } >>> } >>> } >>> } >>> } >>> >>> >>> //Mapping >>> PUT /foo/foo_type/_mapping >>> { >>> "foo_type":{ >>> "properties" : { >>> "title": { >>> "type":"string", >>> "index": "analyzed", >>> "analyzer":"test_1" >>> } >>> } >>> } >>> } >>> >>> >>> Get /foo/foo_type/_mapping >>> { >>> "foo": { >>> "mappings": { >>> "foo_type": { >>> "properties": { >>> "date": { >>> "type": "date", >>> "format": "dateOptionalTime" >>> }, >>> "title": { >>> "type": "string", >>> "analyzer": "test_1" >>> } >>> } >>> } >>> } >>> } >>> } >>> >>> >>> ////Index///////////// >>> PUT /foo/foo_type/1 >>> { >>> "date" : "2009-11-15T14:12:12", >>> "title" : "The quick & <b>brown</b> fox" >>> } >>> >>> >>> //Search ////////// >>> GET /foo/_search?pretty:true >>> { >>> "fields": ["title"], >>> "query": { >>> "query_string": { >>> "query": "brown", >>> "analyzer": "test_1" >>> } >>> } >>> } >>> >>> >>> //Results showing html tags still////// >>> "hits": [ >>> { >>> "_index": "foo", >>> "_type": "foo_type", >>> "_id": "1", >>> "_score": 0.076713204, >>> "fields": { >>> "title": [ >>> "The quick & <b>brown</b> fox" >>> ] >>> } >>> >>> >>> >>> On Thursday, August 7, 2014 6:06:56 PM UTC-4, Jörg Prante wrote: >>> >>>> Have you checked Clint's example? >>>> >>>> https://gist.github.com/clintongormley/780895 >>>> >>>> Jörg >>>> >>>> >>>> On Thu, Aug 7, 2014 at 8:23 PM, IronMike <[email protected]> wrote: >>>> >>>>> I would like to strip html tags for indexing. Here is a simple >>>>> example I tried so far, but doesn't seem to strip html tags. Any ideas >>>>> what's missing? >>>>> >>>>> //settings & Mappings >>>>> POST twitter >>>>> { >>>>> "mappings": { >>>>> "tweet" : { >>>>> "properties" : { >>>>> "message" : { >>>>> "type" : "string", >>>>> "analyzer": "strip_html_analyzer" >>>>> }, >>>>> "date" : { >>>>> "type" : "date" >>>>> }, >>>>> "name" : { >>>>> "type" : "string" >>>>> } >>>>> } >>>>> } >>>>> }, >>>>> "settings": { >>>>> "analysis": { >>>>> "analyzer": { >>>>> "strip_html_analyzer":{ >>>>> "type":"custom", >>>>> "tokenizer":"standard", >>>>> "filter":"standard", >>>>> "char_filter":"my_html" >>>>> } >>>>> }, >>>>> "char_filter": { >>>>> "my_html":{ >>>>> "type":"html_strip" >>>>> } >>>>> } >>>>> } >>>>> } >>>>> } >>>>> >>>>> >>>>> //Index a document >>>>> PUT /twitter/tweet/1 >>>>> { >>>>> "name" : "mike", >>>>> "date" : "2009-11-15T14:12:12", >>>>> "message" : "<html>trying out <b>Elasticsearch</b>, This is an >>>>> html test</html>" >>>>> } >>>>> >>>>> >>>>> //query result for "html", I expect the query to return nothing since >>>>> it is supposed to strip the tag? >>>>> "hits": { >>>>> "total": 1, >>>>> "max_score": 0.11626227, >>>>> "hits": [ >>>>> { >>>>> "_index": "twitter", >>>>> "_type": "tweet", >>>>> "_id": "1", >>>>> "_score": 0.11626227, >>>>> "fields": { >>>>> "message": [ >>>>> "<html>trying out <b>Elasticsearch</b>, This is an >>>>> html test</html>" >>>>> ] >>>>> }, >>>>> "highlight": { >>>>> "message": [ >>>>> "<html>trying out <b>Elasticsearch</b>, This is an >>>>> <em>html</em> test</html>" >>>>> ] >>>>> } >>>>> } >>>>> ] >>>>> } >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "elasticsearch" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> >>>>> To view this discussion on the web visit https://groups.google.com/d/ >>>>> msgid/elasticsearch/517fe8b8-0b38-4646-bc8f-a27896454515%40goo >>>>> glegroups.com >>>>> <https://groups.google.com/d/msgid/elasticsearch/517fe8b8-0b38-4646-bc8f-a27896454515%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "elasticsearch" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion on the web visit https://groups.google.com/d/ >>> msgid/elasticsearch/a831f6f4-b47c-4c35-a40b-058e3c1b1043% >>> 40googlegroups.com >>> <https://groups.google.com/d/msgid/elasticsearch/a831f6f4-b47c-4c35-a40b-058e3c1b1043%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/ffecae0a-0d08-4a76-9717-dee201794be4%40googlegroups.com > <https://groups.google.com/d/msgid/elasticsearch/ffecae0a-0d08-4a76-9717-dee201794be4%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBoHaBYEC-_xygEGkNZcy1-sx_RV_Xcx%2BEyx6bDi8%3D_nw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
