tokenization help mixed n-grams

Ilija Subasic Thu, 26 Feb 2015 04:47:26 -0800

Hi,
  I am trying to create a tokenizer that is going to create a tokens 
looking something like this:


"ab c dd c" would be tokenized as "ab", "abc", "abcd", "abcdd", "abcddc", 
"cd", "cdd", "cddc", "dd", "ddc"

so basically I need something that is going to do an ngram indexing from 
the start of each token. This is different 
then edge n-gram which will tokenize each token separatelly.
Any ideas on how to do this without coding a specific tokenizer.

Thanks,
Ilija

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ec72da21-ed99-4cc8-829c-058467c020a5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

tokenization help mixed n-grams

Reply via email to