Hi, Without changing the flow of conversation and the points which have already been touched upon, I would like to add:
I am really split here between a couple of decisions. I like the abstraction that Gora provides, even though it is somewhat of a pain to configure, this also presents a barrier to adoption for dev's. This being said, Gora is a fundamental component for Nutch 2.0 and once you get to grips with the config and the flexibility which it offers you are then presented with an excellent setup for Nutch 2.0. I understand people's concerns and why they would wish to hardwire to HBase however I would like to point to a (rather lengthy) thread I found last night as I was thinking about my position in this whole affair [1]. In essence this reflects exactly what Julien has mentioned below as well as adding a hellish lot more! I am also with Markus on this one, however there is also no point in me being anything other than totally honest, some of the bugs in trunk 2.0 we are talking about are pretty substantial (I don't even know them all), especially when the API changes are taken into account, therefore I would be learning as I chipped in my part... this would inevitably lead to slower progression on Nutch 2.0 than we all would hope for. Bearing in mind several dev's other commitments both in and out of the ASF. Is this something which can be tolerated or are we to put suggestions in place which adhere to the release early release often ethos and try to get something out of the door. If we could get an official release for Nutch 2.0 then it would mean community testing could commence and instead of improvement suggestions resulting within JIRA tickets we would be getting bugs specifically for 2.0 as independent issues, this would inevitably lead to a better trunk development environment for us all. One inverse aspect of veering towards option A) is that we had a small amount of resistance when Nutch 1.3 was release... would making Nutch 2.0 mainstream, the de facto for Nutch users be a step too far for some of them? I am a firm believer that we should do whatever necessary to get trunk building under Hudson. It seems like a waste of resources that we have the potential to have a stable build environment but it is not being taken advantage of. Obviously I am unaware of exactly what is preventing this, hence my keenness to get it sorted out, but surely we all must agree that this would be beneficial, from a mental point of view as well. If we see that trunk is building successfully then there might be a better feeling about people developing not only on trunk 2.0 but also on Gora and other components upon which trunk 2.0 depends. Further to this, is there any consensus to get a jenkins build established for branch 1.X? It is quite clear that this is our working development strand therefore would this not make sense? I have been looking through the wiki [2] and any committer can get it set up once the PMC chair makes some minor requests on people.apache,org Finally, with regards to the ant/ivy configuration, I am quite happy with the current set up, if someone puts forward a reasonable argument for changing to ant/maven or any other configuration then I will certainly be interested if it adds value to the project. I must agree that changing something which is not broken is far from the direction I had envisaged we were moving... quite the opposite infact. [1] http://www.mail-archive.com/dev@nutch.apache.org/msg00216.html [2] http://wiki.apache.org/general/Hudson On Wed, Aug 10, 2011 at 10:20 AM, Markus Jelsma <markus.jel...@openindex.io>wrote: > Julien, devs, users, > > I'd like to see bugs fixed in 2.0 but some of them are way out of my league > or > would cost me an absurd amount of time. I'd also really like to use Gora > but > Gora must be maintained. Gora will play a fundamental role in 2.0 and if > something is broken there it is not trivial to fix it for us Nutch devs as > it > is yet another component to worry about. > > Tika goes well, it's worked on and there is good enough progress to rely on > from our perspective. If this is not going to be the case with Gora we > should > maybe decide to drop it and hardwire HBASE in it. > > Maintaining 1.x and 2.x is a pain indeed. I'd prefer option A) but i'm not > sure the currently active Nutch devs are going to fix it just like that. > > Cheers, > > > > > > a) put some effort into it, fix the bugs and make so that it can be used > > instead of 1.x > > b) shelve it and leave it for enthusiasts to play with + make 1.x the > trunk > > again > > c) do nothing : keep 2.0 and 1.x in parallel (but having to maintain two > > branches is quite a pain) > > d) abandon the idea of a neutral storage layer with Gora and hardwire it > to > > e.g. HBase > > > > Option (a) has not happened in the last 12 months and I am not very > hopeful > > about it. > > > > What do you guys think? > > > > Julien > > -- > Markus Jelsma - CTO - Openindex > http://www.linkedin.com/in/markus17 > 050-8536620 / 06-50258350 > -- *Lewis*