Hi Nandiya, Have a look at Lucene and its source-code for token filters. You'd implement a custom stemmer at Lucene level, and then just use that in ES.
Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr & Elasticsearch Support * http://sematext.com/ On Monday, July 7, 2014 8:57:09 PM UTC-4, Nandiya Bhikkhu wrote: > > I am interested in using elasticsearch for our website suttacentral.net, > I've tried ES and found it pleasant to use with obvious power, the only > challenge is that on suttacentral we host many buddhist texts in ancient > languages, particularly the pali language, suffix to say there are no > existing stemmers. Stemming is a vital step for searching because pali is a > highly inflected language (like latin). The actual stemming step is > straightforward enough, presently we use a custom stemmer I wrote in > python, it's dead simple and I wouldn't have much trouble implementing the > same code in java (i.e. as a function which takes an inflected word as a > string, and returns the stem as another string). Where I'm in the dark is > making ES call that code. > > All the example stemmer plugins I've found are adapting existing stemmers > to ES. What I really just want is a way to call a function on each token > and use the return value of that function. It seems to me that *should* > be simple enough but I've not managed to find any simple minimalistic code > to use as a template. Although it would be noble at this point I'm not > interested in making a proper plugin, I would be happy with the barest > bodge/hack that would achieve the desired affect! > > If anyone could point me in the right direction, either to a minimalistic > code example, or outline what it would involve, I would be gratefully > appreciative. > > Kind regards, > Nandiya Bhikkhu > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f3b3a496-b434-41b4-84b9-733b3139202c%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
