ok. I was able to enable the language identifier plugin by adding the
value
in plugin.includes attribute
in nutch-site.xml - but i'm not sure just by doing that I can have thai
text
recognized and tokenized
properly.
What else do I have to do ? Please help me.

1. You must create a thai NGP (Ngram Profile file) so that the language
identifier can identify thai !
2. You must create a thai analyzer (see for instance analysis-fr and
analysis-de sample analyzers).

Best Regards

Jérôme

Reply via email to