Re: Issue with using word delimiter filter

Amit Soni Mon, 21 Apr 2014 20:47:27 -0700

hi everyone - I have changed the mapping so that it now looks like below.
However for a given input say 123-456-8989, the generated tokens are:


a) 123-456-8989 b) 123 c) 456 d) 8989 e) 1234568989

I was expecting just two tokens: a) 123-456-8989 b) 1234568989

Would you know what might be going wrong here?

"default_index": {
          "tokenizer": "keyword",
          "filter": [
            "lowercase"
          ]
},

"phoneAnalyzer": {
          "type": "custom",
          "tokenizer": "keyword",
          "filter": [
            "word_delimiter_for_phone"
          ]
},

"word_delimiter_for_phone": {
          "type": "word_delimiter",
          "catenate_all": true,
          "generate_number_parts ": false,
          "split_on_case_change": false,
          "generate_word_parts": false,
          "split_on_numerics": false,
          "preserve_original": true
},

-Amit.


On Fri, Nov 1, 2013 at 1:07 AM, David Pilato <[email protected]> wrote:

> Sorry. Forget my answer. Useless here.
>
>
> --
> David ;-)
> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>
> Le 1 nov. 2013 à 08:05, David Pilato <[email protected]> a écrit :
>
> Or disable analysis for this field.
>
> HTH
>
> --
> David ;-)
> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>
> Le 1 nov. 2013 à 07:42, [email protected] a écrit :
>
> Analysis starts by using tokenizer, which in your case is "standard".
> Therefore the input "345 678-1234" will be tokenized to "345", "678", and
> "1234", and only then the filters will be applied. A solution to get the
> original and the concatenated input would be to use the "keyword" tokenizer.
>
> On Thursday, October 31, 2013 8:10:55 PM UTC+1, amit.soni wrote:
>>
>> Hi all - I have a phone number field and I am trying to use
>> word_delimiter filter in order break it up into tokens, preserve the
>> original entry and concatenate all the numbers in the entry. I have the
>> following entry:
>>
>> "phoneAnalyzer" :  {
>>                     "type": "custom",
>>                     "tokenizer": "standard",
>>                     "filter": [
>>                         "word_delimiter_for_phone"
>>                     ]
>>                 }
>>
>> "filter": {
>>                 "word_delimiter_for_phone": {
>>                     "type": "word_delimiter",
>> *                     "catenate_numbers" : true,*
>>                      "preserve_original" : true
>>                 },
>> }
>>
>> Using this, when I run it on input "345 678-1234" I get the following:
>>
>> {
>>   "tokens" : [ {
>>     "token" : "*345*",
>>     "start_offset" : 0,
>>     "end_offset" : 3,
>>     "type" : "<NUM>",
>>     "position" : 1
>>   }, {
>>     "token" : "*678*",
>>     "start_offset" : 4,
>>     "end_offset" : 7,
>>     "type" : "<NUM>",
>>     "position" : 2
>>   }, {
>>     "token" : "*1234*",
>>     "start_offset" : 8,
>>     "end_offset" : 12,
>>     "type" : "<NUM>",
>>     "position" : 3
>>   } ]
>> }
>>
>> Question: Should this also not have generated a concatenated string of
>> the form: 3456781234.
>>
>> Anything I am missing here?
>>
>> -Amit.
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAAOGaQKiEQhwJFfVwTBEHkeF%2BCK%2B8zpw6WC%2BpmSDeUgjTFtN2Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Issue with using word delimiter filter

Reply via email to