[
http://issues.apache.org/jira/browse/NUTCH-237?page=comments#action_12371687 ]
Dawid Weiss commented on NUTCH-237:
-----------------------------------
Yes and no. I removed the "support" for foreign languages from the constructor
code:
// We initialize Lingo with English stemming and stopwords. Lingo has
// a simple language detection filter, but you'll be better off
hardcoding
// the language according to your needs. If you have bilingual indices,
// then there is a possibility of creating a more complex process that
assigns
// a language tag before the clustering is actually started.
return new LingoLocalFilterComponent(
new Language[] { new English() },
defaults);
}
Language detection is not really brilliant in the open source Lingo so I
thought it wouldn't make sense to give people false hopes. Now, all the
stemmers and stopword lists are still included in the release (look inside
carrot2-util-tokenizer.jar$/com/dawidweiss/carrot/util/tokenizer/languages/...)
so you can freely switch to another language by changing the instantiated
language.
I have a better idea though -- how about if you apply this patch (because I\ve
tested it and know it works) and I'll make the language configurable via ISO
codes set in nutch configuration? The default would be English and you could
set your own language in there if you wanted to. All right?
> Carrot2 clustering plugin upgrade.
> ----------------------------------
>
> Key: NUTCH-237
> URL: http://issues.apache.org/jira/browse/NUTCH-237
> Project: Nutch
> Type: Improvement
> Reporter: Dawid Weiss
> Priority: Trivial
> Attachments: c2.patch, libs.zip, svn-stat.txt
>
> This is an upgrade of the clustering plugin to the newest release (1.0.2).
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira