whitespace tokenizer not working as I'd expect

Craig Ching Thu, 12 Mar 2015 07:42:16 -0700

Hi all,

I'm trying to break up some strings to use in a full text search leaving 
the original field intact.  I have created a "full_text" field that is 
populated from a "name" field using "copy_to" and an analyzer that looks 
like this:



    "settings" : {
        "analysis": {
            "char_filter" : {
                "full_text_mapping" : {
                    "type": "mapping",
                    "mappings" : [".=>%20", "_=>%20"]
                }
            },
            "analyzer" : {
                "full_text_analyzer" : {
                    "type" : "custom",
                    "char_filter" : "full_text_mapping",
                    "tokenizer" : "whitespace",
                    "filter" : ["lowercase"]
                }
            }
        }
    },



As you can see I'm trying to convert '.' and '_' to ' ' before the 
whitespace tokenizer kicks in.  It's my understanding that the char_filter 
will replace those characters with whitespace that the whitespace tokenizer 
would then tokenize and then all components could be searchable.  For 
instance, I would expect "GRIZZLY.BEAR" to be found using both "grizzly" 
and "bear".  But with the whitespace tokenizer I am not able to find the 
document with either term.  So what am I not understanding?  Full script 
showing what I'm doing:

#!/bin/sh

ES=localhost:9200

echo ">>> Deleting _all"
curl -XDELETE $ES/_all

echo ">>> Creating the index 'animals'"
curl -XPUT $ES/animals -d'
{
    "settings" : {
        "analysis": {
            "char_filter" : {
                "full_text_mapping" : {
                    "type": "mapping",
                    "mappings" : [".=>%20", "_=>%20"]
                }
            },
            "analyzer" : {
                "full_text_analyzer" : {
                    "type" : "custom",
                    "char_filter" : "full_text_mapping",
                    "tokenizer" : "whitespace",
                    "filter" : ["lowercase"]
                }
            }
        }
    },
    "mappings" : {
        "bear" : {
            "properties" : {
                "suggest" : {
                    "type" : "completion",
                    "analyzer" : "simple",
                    "payloads" : true
                },
                "full_text" : {
                    "type" : "string",
                    "analyzer" : "full_text_analyzer"
                },
                "name" : {
                    "type" : "string",
                    "index" : "not_analyzed",
                    "copy_to" : "full_text"
                }
            }
        }
    }
}' && echo

echo ">>> Indexing the GRIZZLY.BEAR document"
curl -XPOST $ES/animals/bear -d'
{
    "name": "GRIZZLY.BEAR"
}
' && echo

curl -XPOST $ES/animals/_flush && echo

# Search for the document using the name
echo
echo ">>> Searching for name:GRIZZLY.BEAR"
echo
curl $ES/animals/bear/_search -d'
{
    "query" : {
        "match" : {
            "name" : "GRIZZLY.BEAR"
        }
    }
}
' && echo

# Search for the document using a general term
echo
echo ">>> Searching for full_text:grizzly"
echo
curl $ES/animals/bear/_search -d'
{
    "query" : {
        "match" : {
            "full_text" : "grizzly"
        }
    }
}
' && echo

# Search for the document using a general term
echo
echo ">>> Searching for full_text:bear"
echo
curl $ES/animals/bear/_search -d'
{
    "query" : {
        "match" : {
            "full_text" : "bear"
        }
    }
}
' && echo

I appreciate any help with this!

Cheers,
Craig

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5fa2347f-3019-4973-9d67-7f18b3dfee9e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

whitespace tokenizer not working as I'd expect

Reply via email to