Ivan,

The search results I am showing is for the field "title" not for the 
source. I thought I could query the field not the source and look at it 
with no html while the source was intact. Did I misunderstand?

On Friday, August 8, 2014 12:36:16 PM UTC-4, Ivan Brusic wrote:
>
> The analyzers control how text is parsed/tokenized and how terms are 
> indexed in the inverted index. The source document remains untouched.
>
> -- 
> Ivan
>
>
> On Fri, Aug 8, 2014 at 9:24 AM, IronMike <[email protected] <javascript:>
> > wrote:
>
>> I also used Clint's example and tried to map it to a document and search 
>> the field, but still getting html in query results... Here is my code. I 
>> appreciate the help.
>>
>> //Tokenizer
>>
>> PUT /foo/
>> {
>>  "settings": {
>>    "index" : {
>>       "analysis" : {
>>          "analyzer" : {
>>             "test_1" : {
>>                "char_filter" : [
>>                   "html_strip"
>>                ],
>>                "tokenizer" : "standard"
>>             }
>>          }
>>       }
>>    }
>>  }
>> }
>>
>>
>> //Mapping
>> PUT /foo/foo_type/_mapping
>> {
>>   "foo_type":{ 
>>          "properties" : {
>>                    "title": {
>>                          "type":"string",
>>                          "index": "analyzed", 
>>                          "analyzer":"test_1"
>>                          }
>>                        }
>>            }
>> }
>>
>>
>> Get /foo/foo_type/_mapping
>> {
>>    "foo": {
>>       "mappings": {
>>          "foo_type": {
>>             "properties": {
>>                "date": {
>>                   "type": "date",
>>                   "format": "dateOptionalTime"
>>                },
>>                "title": {
>>                   "type": "string",
>>                   "analyzer": "test_1"
>>                }
>>             }
>>          }
>>       }
>>    }
>> }
>>
>>
>> ////Index/////////////
>> PUT /foo/foo_type/1
>> {
>>     "date" : "2009-11-15T14:12:12",
>>     "title" : "The quick & <b>brown</b> fox"
>> }
>>
>>
>> //Search //////////
>> GET /foo/_search?pretty:true
>> {
>>    "fields": ["title"], 
>>     "query": {
>>         "query_string": {
>>             "query": "brown",
>>             "analyzer": "test_1"
>>         }
>>     }
>> }
>>
>>
>> //Results showing html tags still//////
>> "hits": [
>>          {
>>             "_index": "foo",
>>             "_type": "foo_type",
>>             "_id": "1",
>>             "_score": 0.076713204,
>>             "fields": {
>>                "title": [
>>                   "The quick & <b>brown</b> fox" 
>>                ]
>>             }
>>
>>
>>
>> On Thursday, August 7, 2014 6:06:56 PM UTC-4, Jörg Prante wrote:
>>
>>> Have you checked Clint's example?
>>>
>>> https://gist.github.com/clintongormley/780895
>>>
>>> Jörg
>>>
>>>
>>> On Thu, Aug 7, 2014 at 8:23 PM, IronMike <[email protected]> wrote:
>>>
>>>>  I would like to strip html tags for indexing. Here is a simple 
>>>> example I tried so far, but doesn't seem to strip html tags. Any ideas 
>>>> what's missing?
>>>>
>>>> //settings & Mappings
>>>> POST twitter
>>>> {
>>>>   "mappings": {
>>>>     "tweet" : {
>>>>       "properties" : {
>>>>         "message" : {
>>>>           "type" :    "string",
>>>>           "analyzer": "strip_html_analyzer"
>>>>         },
>>>>         "date" : {
>>>>           "type" :   "date"
>>>>         },
>>>>         "name" : {
>>>>           "type" :   "string"
>>>>         }
>>>>       }
>>>>     }
>>>>   },
>>>>   "settings": {
>>>>     "analysis": {
>>>>       "analyzer": {
>>>>         "strip_html_analyzer":{
>>>>             "type":"custom",
>>>>             "tokenizer":"standard",
>>>>             "filter":"standard",
>>>>             "char_filter":"my_html"
>>>>         }
>>>>       },
>>>>       "char_filter": {
>>>>           "my_html":{
>>>>               "type":"html_strip"
>>>>           }
>>>>       }
>>>>     }
>>>>   }
>>>> }
>>>>
>>>>
>>>> //Index a document
>>>> PUT /twitter/tweet/1
>>>> {
>>>>     "name" : "mike",
>>>>     "date" : "2009-11-15T14:12:12",
>>>>     "message" : "<html>trying out <b>Elasticsearch</b>, This is an html 
>>>> test</html>"
>>>> }
>>>>
>>>>
>>>> //query result for "html", I expect the query to return nothing since 
>>>> it is supposed to strip the tag?
>>>> "hits": {
>>>>       "total": 1,
>>>>       "max_score": 0.11626227,
>>>>       "hits": [
>>>>          {
>>>>             "_index": "twitter",
>>>>             "_type": "tweet",
>>>>             "_id": "1",
>>>>             "_score": 0.11626227,
>>>>             "fields": {
>>>>                "message": [
>>>>                   "<html>trying out <b>Elasticsearch</b>, This is an 
>>>> html test</html>"
>>>>                ]
>>>>             },
>>>>             "highlight": {
>>>>                "message": [
>>>>                   "<html>trying out <b>Elasticsearch</b>, This is an 
>>>> <em>html</em> test</html>"
>>>>                ]
>>>>             }
>>>>          }
>>>>       ]
>>>>    }
>>>>
>>>>
>>>>
>>>>
>>>>  -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "elasticsearch" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>>
>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>> msgid/elasticsearch/517fe8b8-0b38-4646-bc8f-a27896454515%
>>>> 40googlegroups.com 
>>>> <https://groups.google.com/d/msgid/elasticsearch/517fe8b8-0b38-4646-bc8f-a27896454515%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/a831f6f4-b47c-4c35-a40b-058e3c1b1043%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/a831f6f4-b47c-4c35-a40b-058e3c1b1043%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ffecae0a-0d08-4a76-9717-dee201794be4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to