ok. I was able to enable the language identifier plugin by adding the value in plugin.includes attribute in nutch-site.xml - but i'm not sure just by doing that I can have thai text recognized and tokenized properly. What else do I have to do ? Please help me.
1. You must create a thai NGP (Ngram Profile file) so that the language identifier can identify thai ! 2. You must create a thai analyzer (see for instance analysis-fr and analysis-de sample analyzers). Best Regards Jérôme
