Jérôme Charron wrote:
jar. A short-term solutions could be to move the core classes (which have no
dependencies on
nutch) to a new lib-plugin (lib-lang for instance and adding a dependecy to
this plugin in the
language-identifier), so that this code could be used as a standalone lib.

Are you ok, with such changes?

Perhaps you could isolate ngram specific stuff to own plugin and the lang-id into other.

Or the other option would be (what I implemented some time ago) something like this (as ngram categorizer can also used for other
interesting stuff):

new package in core nutch containing classes like:

NGramProfile - pretty much as is
Categorizer - generic configurable ngram categorizer, configure profiles, ngram sizes etc.
CategorizerFactory - to get hold of different categorizers

In LangId plugin you just get a correct ( configured to use lang ngram profiles and predefined settings for ngramsizes etc ) categorizer from factory and tell it to do it's job when needed.

--
 Sami Siren

Reply via email to