Yes, certainly, anything that can be shared and decoupled from pieces that make each branch (not SVN/CVS branch) different, should be decoupled. But I was really curious about whether people think this is a valid idea/direction, not necessarily immediately how things should be implemented. In my mind, one branch is the branch that runs on top of Hadoop, with NameNode, DataNode, HDFS, etc. That's the branch that's in the trunk. The other branch is a simpler branch without all that Hadoop stuff, for folks who need to fetch, index, and search a few hundred thousand or a few million or even a few tens of millions of pages, and don't need replication, etc. that comes with Hadoop. That branch could be based off of 0.7. I also know that a lot of people are trying to use Nutch to build vertical search engines, so there is also a need for a focused fetcher. Kelvin Tan brought this up a few times, too, I believe.
I *think* there is a need for that. I *can't* help shepherd this, but wanted to bring this up, in case there are people lurking who want to work on this. Otis ----- Original Message ---- From: Sami Siren <[EMAIL PROTECTED]> To: nutch-dev@lucene.apache.org Sent: Monday, January 22, 2007 10:52:47 AM Subject: Re: Reviving Nutch 0.7 Chris Mattmann wrote: > In any case, I think that, if we are going to maintain separate branches of > the source, in fact, really parallel projects, then an undertaking such as > Tika is properly needed ... I still don't think we need separate project to start with, IMO right mode of mind is enough to get going. If people thing this is right direction and it goes beyond talk then perhaps after that we could start talking about separate project. -- Sami Siren