I am interested in using elasticsearch for our website suttacentral.net, I've tried ES and found it pleasant to use with obvious power, the only challenge is that on suttacentral we host many buddhist texts in ancient languages, particularly the pali language, suffix to say there are no existing stemmers. Stemming is a vital step for searching because pali is a highly inflected language (like latin). The actual stemming step is straightforward enough, presently we use a custom stemmer I wrote in python, it's dead simple and I wouldn't have much trouble implementing the same code in java (i.e. as a function which takes an inflected word as a string, and returns the stem as another string). Where I'm in the dark is making ES call that code.
All the example stemmer plugins I've found are adapting existing stemmers to ES. What I really just want is a way to call a function on each token and use the return value of that function. It seems to me that *should* be simple enough but I've not managed to find any simple minimalistic code to use as a template. Although it would be noble at this point I'm not interested in making a proper plugin, I would be happy with the barest bodge/hack that would achieve the desired affect! If anyone could point me in the right direction, either to a minimalistic code example, or outline what it would involve, I would be gratefully appreciative. Kind regards, Nandiya Bhikkhu -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fe2c777e-b823-4652-8f6c-ecf42ec36d33%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
