Yes, certainly, anything that can be shared and decoupled from pieces that make 
each branch (not SVN/CVS branch) different, should be decoupled.  But I was 
really curious about whether people think this is a valid idea/direction, not 
necessarily immediately how things should be implemented.  In my mind, one 
branch is the branch that runs on top of Hadoop, with NameNode, DataNode, HDFS, 
etc.  That's the branch that's in the trunk.  The other branch is a simpler 
branch without all that Hadoop stuff, for folks who need to fetch, index, and 
search a few hundred thousand or a few million or even a few tens of millions 
of pages, and don't need replication, etc. that comes with Hadoop.  That branch 
could be based off of 0.7.  I also know that a lot of people are trying to use 
Nutch to build vertical search engines, so there is also a need for a focused 
fetcher.  Kelvin Tan brought this up a few times, too, I believe.

I *think* there is a need for that.
I *can't* help shepherd this, but wanted to bring this up, in case there are 
people lurking who want to work on this.

Otis

----- Original Message ----
From: Sami Siren <[EMAIL PROTECTED]>
To: nutch-dev@lucene.apache.org
Sent: Monday, January 22, 2007 10:52:47 AM
Subject: Re: Reviving Nutch 0.7

Chris Mattmann wrote:
> In any case, I think that, if we are going to maintain separate branches of
> the source, in fact, really parallel projects, then an undertaking such as
> Tika is properly needed ...

I still don't think we need separate project to start with, IMO right
mode of mind is enough to get going. If people thing this is right
direction and it goes beyond talk then perhaps after that we could start
talking about separate project.


--
 Sami Siren





Reply via email to