Jérôme Charron wrote:
jar. A short-term solutions could be to move the core classes (which have no
dependencies on
nutch) to a new lib-plugin (lib-lang for instance and adding a dependecy to
this plugin in the
language-identifier), so that this code could be used as a standalone lib.
Are you ok, with such changes?
Perhaps you could isolate ngram specific stuff to own plugin and the
lang-id into other.
Or the other option would be (what I implemented some time ago)
something like this (as ngram categorizer can also used for other
interesting stuff):
new package in core nutch containing classes like:
NGramProfile - pretty much as is
Categorizer - generic configurable ngram categorizer, configure
profiles, ngram sizes etc.
CategorizerFactory - to get hold of different categorizers
In LangId plugin you just get a correct ( configured to use lang ngram
profiles and predefined settings for ngramsizes etc ) categorizer from
factory and tell it to do it's job when needed.
--
Sami Siren