Julien, devs, users,
I'd like to see bugs fixed in 2.0 but some of them are way out of my league or
would cost me an absurd amount of time. I'd also really like to use Gora but
Gora must be maintained. Gora will play a fundamental role in 2.0 and if
something is broken there it is not trivial to fix it for us Nutch devs as it
is yet another component to worry about.
Tika goes well, it's worked on and there is good enough progress to rely on
from our perspective. If this is not going to be the case with Gora we should
maybe decide to drop it and hardwire HBASE in it.
Maintaining 1.x and 2.x is a pain indeed. I'd prefer option A) but i'm not
sure the currently active Nutch devs are going to fix it just like that.
On Tuesday 09 August 2011 17:10:12 Julien Nioche wrote:
> Hi Kirby,
> Grumble, Grumble. (adding dev@nutch, as that is more than likely
> > where this discussion really belongs)...
> am adding gora-...@incubator.apache.org as well
> > It'd be really nice if folks could just follow the commands in the
> > nightly build, and get a build pushed out. I've pointed this out
> > previously, and was told this would be fixed "shortly" (right after
> > GORA-0.1 finally got released, but not published in public maven repo,
> > which as far as I know, it still isn't published, but I stopped
> > checking on it).
> I understand and share your frustration, however you need to bear in mind
> that things are done only if people volunteer and have time - usually taken
> from their holiday, weekends, evenings. Chris (who is the de facto release
> master for Nutch and Gora) has not had the time and nobody else has
> volunteered to do it.
> > As it happens, yesterday was the 1 year anniversary of the last
> > successful Hudson/Jenkins build... If that actually worked, we could
> > point people towards it as a useful recipe for how to get a build
> > working off trunk. I haven't been following Nutch too closely, but it
> > always strikes me as really odd, that there's a nightly build and it
> > doesn't bother anybody that it fails all the time (and that there
> > isn't a nightly build for the stable branches).
> The real issue behind all this is what we should do with Nutch 2.0. What
> follows is only my opinion and I would love to hear what others have to say
> on this subject.
> Since we (actually mostly Dogacan) wrote 2.0 and delegated the storage to
> Gora, the latter hasn't really taken off since incubation. There have been
> some modest contributions to it but it does not seem to be used much and
> there is virtually nothing happening on it in terms of development. More
> worryingly, the people who initially contributed to it are not very active
> on the project (such is life, new jobs, different projects, etc...)
> anymore·. As for Nutch 2.0, it hasn't made any progress in the last 12
> months : we still have the same bugs, the tests do not work, the build has
> to be done manually etc...
> At the same time, there has been a new lease of life into Nutch as a whole
> : there is definitely more activity on the mailing lists, new users, new
> active committers etc... and quite a few bugfixes and improvements - most
> of them backported from what had been done in the trunk and people seem
> fairly happy with what we can do with 1.4
> So the question is : what shall we do with 2.0? Here are a few
> a) put some effort into it, fix the bugs and make so that it can be used
> instead of 1.x
> b) shelve it and leave it for enthusiasts to play with + make 1.x the trunk
> c) do nothing : keep 2.0 and 1.x in parallel (but having to maintain two
> branches is quite a pain)
> d) abandon the idea of a neutral storage layer with Gora and hardwire it to
> e.g. HBase
> Option (a) has not happened in the last 12 months and I am not very hopeful
> about it.
> What do you guys think?
Markus Jelsma - CTO - Openindex
050-8536620 / 06-50258350