I also used Clint's example and tried to map it to a document and search
the field, but still getting html in query results... Here is my code. I
appreciate the help.
//Tokenizer
PUT /foo/
{
"settings": {
"index" : {
"analysis" : {
"analyzer" : {
"test_1" : {
"char_filter" : [
"html_strip"
],
"tokenizer" : "standard"
}
}
}
}
}
}
//Mapping
PUT /foo/foo_type/_mapping
{
"foo_type":{
"properties" : {
"title": {
"type":"string",
"index": "analyzed",
"analyzer":"test_1"
}
}
}
}
Get /foo/foo_type/_mapping
{
"foo": {
"mappings": {
"foo_type": {
"properties": {
"date": {
"type": "date",
"format": "dateOptionalTime"
},
"title": {
"type": "string",
"analyzer": "test_1"
}
}
}
}
}
}
////Index/////////////
PUT /foo/foo_type/1
{
"date" : "2009-11-15T14:12:12",
"title" : "The quick & <b>brown</b> fox"
}
//Search //////////
GET /foo/_search?pretty:true
{
"fields": ["title"],
"query": {
"query_string": {
"query": "brown",
"analyzer": "test_1"
}
}
}
//Results showing html tags still//////
"hits": [
{
"_index": "foo",
"_type": "foo_type",
"_id": "1",
"_score": 0.076713204,
"fields": {
"title": [
"The quick & <b>brown</b> fox"
]
}
On Thursday, August 7, 2014 6:06:56 PM UTC-4, Jörg Prante wrote:
>
> Have you checked Clint's example?
>
> https://gist.github.com/clintongormley/780895
>
> Jörg
>
>
> On Thu, Aug 7, 2014 at 8:23 PM, IronMike <[email protected] <javascript:>
> > wrote:
>
>> I would like to strip html tags for indexing. Here is a simple example I
>> tried so far, but doesn't seem to strip html tags. Any ideas what's missing?
>>
>> //settings & Mappings
>> POST twitter
>> {
>> "mappings": {
>> "tweet" : {
>> "properties" : {
>> "message" : {
>> "type" : "string",
>> "analyzer": "strip_html_analyzer"
>> },
>> "date" : {
>> "type" : "date"
>> },
>> "name" : {
>> "type" : "string"
>> }
>> }
>> }
>> },
>> "settings": {
>> "analysis": {
>> "analyzer": {
>> "strip_html_analyzer":{
>> "type":"custom",
>> "tokenizer":"standard",
>> "filter":"standard",
>> "char_filter":"my_html"
>> }
>> },
>> "char_filter": {
>> "my_html":{
>> "type":"html_strip"
>> }
>> }
>> }
>> }
>> }
>>
>>
>> //Index a document
>> PUT /twitter/tweet/1
>> {
>> "name" : "mike",
>> "date" : "2009-11-15T14:12:12",
>> "message" : "<html>trying out <b>Elasticsearch</b>, This is an html
>> test</html>"
>> }
>>
>>
>> //query result for "html", I expect the query to return nothing since it
>> is supposed to strip the tag?
>> "hits": {
>> "total": 1,
>> "max_score": 0.11626227,
>> "hits": [
>> {
>> "_index": "twitter",
>> "_type": "tweet",
>> "_id": "1",
>> "_score": 0.11626227,
>> "fields": {
>> "message": [
>> "<html>trying out <b>Elasticsearch</b>, This is an html
>> test</html>"
>> ]
>> },
>> "highlight": {
>> "message": [
>> "<html>trying out <b>Elasticsearch</b>, This is an
>> <em>html</em> test</html>"
>> ]
>> }
>> }
>> ]
>> }
>>
>>
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/517fe8b8-0b38-4646-bc8f-a27896454515%40googlegroups.com
>>
>> <https://groups.google.com/d/msgid/elasticsearch/517fe8b8-0b38-4646-bc8f-a27896454515%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a831f6f4-b47c-4c35-a40b-058e3c1b1043%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.