> Before doubling (or after 0.9.0 tripling?) the maintenance/development  work
> please consider the following:
> 
> One option would be re factoring the code in a way that the parts that are
> usable to other projects like protocols?, parsers (this actually was
> proposed by
> Jukka Zitting some time last year) and stuff would be modified to be
> independent
> of nutch (and hadoop) code. Yeah, this is easy to say, but would require
> significant amount of work.
> 
> The "more focused",smaller chunks of nutch would probably also get bigger
> audience (perhaps also outside nutch land) and that way perhaps more people
> willing to work for them.
> 
> Don't know about others but at least I would be more willing to work towards
> this goal than the one where there would be practically many separate
> projects,
> each sharing common functionality but different code base.

+1 ;)

This was actually the project proposed by Jerome Charron and myself, called
"Tika". We went so far as to create a project proposal, and send it out to
the nutch-dev list, as well as the Lucene PMC for potential Lucene
sub-project goodness. I could probably dig up the proposal should the need
arise.

Good ol' Jukka then took that effort and created us a project within Google
code, that still lives in there in fact:

http://code.google.com/p/tika/

There hasn't be active development on it because:

1. None of us (I'm speaking for Jerome, and myself here) ended up having the
time to shepherd it going forward

2. There was little, if any response, from the proposal to the nutch-dev
list, and folks willing to contribute (besides people like Jukka)

3. I think, as you correctly note above, most people thought it to be too
much of a Herculean effort that wouldn't pay the necessary dividends in the
end to undertake it


In any case, I think that, if we are going to maintain separate branches of
the source, in fact, really parallel projects, then an undertaking such as
Tika is properly needed ...

Cheers,
   Chris




> 
> --
>  Sami Siren


Reply via email to