Hi, I am trying to create a tokenizer that is going to create a tokens looking something like this:
"ab c dd c" would be tokenized as "ab", "abc", "abcd", "abcdd", "abcddc", "cd", "cdd", "cddc", "dd", "ddc" so basically I need something that is going to do an ngram indexing from the start of each token. This is different then edge n-gram which will tokenize each token separatelly. Any ideas on how to do this without coding a specific tokenizer. Thanks, Ilija -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ec72da21-ed99-4cc8-829c-058467c020a5%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
