Hi, You can use mapping char filter to remove white space and then ngram tokenises with min_gram=2/max_gram=<whatever you like> to make it ngrams. (not sure if you’d like to omit “bc”, “bcd”… or not though)
Masaru On February 26, 2015 at 21:46:42, Ilija Subasic ([email protected]) wrote: > Hi, > I am trying to create a tokenizer that is going to create a tokens > looking something like this: > > "ab c dd c" would be tokenized as "ab", "abc", "abcd", "abcdd", "abcddc", > "cd", "cdd", "cddc", "dd", "ddc" > > so basically I need something that is going to do an ngram indexing from > the start of each token. This is different > then edge n-gram which will tokenize each token separatelly. > Any ideas on how to do this without coding a specific tokenizer. > > Thanks, > Ilija > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" > group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/ec72da21-ed99-4cc8-829c-058467c020a5%40googlegroups.com. > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.54f02339.625558ec.129%40citra.local. For more options, visit https://groups.google.com/d/optout.
