Jérôme Charron wrote:
> I would be disappointed by this move - language identifier is an
> important component in Nutch. Now the mere fact that it's bundled
> with Nutch encourages its proper maintenance. If there is enough
> drive in terms of willingness and long-term commitment it would
> make sense to move it to a separate project on its own (or maybe as
> a part of Jakarta Commons), but moving it into a catch-all purely
> optional category like Lucene contrib would increase risks that it
> slides into oblivion...
Ok, Andrzej, I really understand your meaning. But more and more
people are contacting me directly in order to use the
language-identifier, but not as a nutch plugin, simply as a
standalone library. They get confused when I explain them that they
need the nutch jar in order to use the language-identifier. That's
why I would like to make it a standalone jar. A short-term solutions
could be to move the core classes (which have no dependencies on
nutch) to a new lib-plugin (lib-lang for instance and adding a
dependecy to this plugin in the language-identifier), so that this
code could be used as a standalone lib.
Are you ok, with such changes?
Yes, certainly, it's a good intermediate step before moving it to a
separate project.
There are some other things that Doug mentioned that he would like to
separate from Nutch, like the IO and mapred frameworks. A similar
approach could be taken with these parts - this would encourage good
separation in design, and also prepare these parts to be separated into
their own projects.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com