Hi Guys,

I thought I'd chime in on this thread. My comments below:

> I understand and share your frustration, however you need to bear in mind
> that things are done only if people volunteer and have time - usually taken
> from their holiday, weekends, evenings. Chris (who is the de facto release
> master for Nutch and Gora) has not had the time and nobody else has
> volunteered to do it.

Yep I haven't had the time to push a Gora 0.1.1-incubating release that will 
address the Maven issues. However it is on my roadmap for open source 
stuff to get done in the next month, so that's a good thing. But yes, that 
portion of 
my open source work is all volunteer time, so sometimes other things take 
priority. 

> 
> 
>> As it happens, yesterday was the 1 year anniversary of the last
>> successful Hudson/Jenkins build...  If that actually worked, we could
>> point people towards it as a useful recipe for how to get a build
>> working off trunk.  I haven't been following Nutch too closely, but it
>> always strikes me as really odd, that there's a nightly build and it
>> doesn't bother anybody that it fails all the time (and that there
>> isn't a nightly build for the stable branches).
>> 
> 
> The real issue behind all this is what we should do with Nutch 2.0. What
> follows is only my opinion and I would love to hear what others have to say
> on this subject.
> 
> Since we (actually mostly Dogacan) wrote 2.0 and delegated the storage to
> Gora, the latter hasn't really taken off since incubation. There have been
> some modest contributions to it but it does not seem to be used much and
> there is virtually nothing happening on it in terms of development. More
> worryingly, the people who initially contributed to it are not very active
> on the project (such is life, new jobs, different projects, etc...)
> anymore·. As for Nutch 2.0, it hasn't made any progress in  the last 12
> months : we still have the same bugs, the tests do not work, the build has
> to be done manually etc...

Yep.

> 
> At the same time, there has been a new lease of life into Nutch as a whole :
> there is definitely more activity on the mailing lists, new users, new
> active committers  etc... and quite a few bugfixes and improvements - most
> of them backported from what had been done in the trunk and people seem
> fairly happy with what we can do with 1.4

Totally agreed. I'm actually not super surprised -- ever since 1.1, I kind of 
felt that 
maintaining a stable 1.X branch of Nutch (in parallel to the 2.0 efforts) was 
really 
going to pay off since there was renewed interest from users in leveraging 
(and furthermore accepting) the nuances of 1.X.

> 
> So the question is : what shall we do with 2.0? Here are a few possibilities
> :
> 
> a) put some effort into it, fix the bugs and make so that it can be used
> instead of 1.x
> b) shelve it and leave it for enthusiasts to play with + make 1.x the trunk
> again
> c) do nothing : keep 2.0 and 1.x in parallel  (but having to maintain two
> branches is quite a pain)
> d) abandon the idea of a neutral storage layer with Gora and hardwire it to
> e.g. HBase
> 
> Option (a) has not happened in the last 12 months and I am not very hopeful
> about it.
> 
> What do you guys think?

I'd suggest an option e). Evolve and keep releasing 1.X over the next 6 months, 
and keep 2.0 in the trunk. After 6 months, see how close 1.X is to actually 
being 
2.0 (e.g., did we release a 1.4, a 1.5, a 1.6?) If we get to ~1.6 over the next 
6 months 
and there is still no active development on 2.0, I'd propose we do this at that 
point 
in time:

1. branch the current trunk as 
https://svn.apache.org/repos/asf/nutch/branches/nutchgora
2. grab latest stable branch (e.g., 
https://svn.apache.org/repos/asf/nutch/branches/branch-1.6) and 
*replace* the Nutch trunk with it, and bump the version # to 1.7-dev
3. active development on stable becomes active development in trunk and 
nutchgora still 
exists in case anyone ever resurrects it.

That way, we give another 6 months to see how it shakes out and potentially 
allow for 1 or 2 or 3
more stable releases before switching those over to trunk.

Thoughts?

BTW, I have a couple contributions from my CS572: Search Engines class from a 
year ago that 
I'd love to port into the Nutch stable branch including Hubs/Authorities 
ranking and some other 
goodies. I'll try and work on those over the next few months, I'm just letting 
everyone know now 
so I don't forget again :-)

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Reply via email to