Hey Markus, No worries. I actually have no dog in this fight to be honest.
I want Gora to be successful, and I want Nutch to be successful. I haven't contributed much to Nutch 2.0 trunk but I have been to the 1.x series branch. I wish I knew more about Gora's internals (and am trying to learn) so I could help more with it. I think it will make a lot of sense to use it at some point. At the same time, I'm all for making 1.x releases and naturally getting to 2.0 over time based on our current progress and understanding. I'm also super excited about the 1.x versions of Nutch and when I think about it the reality is that they've always been Nutch trunk even though we artificially tried to turn the nutchbase brancn into it. So to wrap it up, I'm totally fine with 1.x moving into trunk and with executing the plan I proposed a while back: ---snip 1. branch the current trunk as https://svn.apache.org/repos/asf/nutch/branches/nutchgora 2. grab latest stable branch (e.g., https://svn.apache.org/repos/asf/nutch/branches/branch-1.6) and *replace* the Nutch trunk with it, and bump the version # to 1.7-dev 3. active development on stable becomes active development in trunk and nutchgora still exists in case anyone ever resurrects it. ---snip Of course, it's not 1.6 (I was optimistic about getting there in 6 months ;) ), but it's really 1.4. And we don't need to bump to -dev since we're already in full dev with the 1.4 cycle. So, I'm ready for a VOTE. Feel free to call one (or have Julien do it), and I'll VOTE +1. Cheers, Chris On Sep 17, 2011, at 10:18 AM, Markus Jelsma wrote: > Hi Chris, > > I initially respawned this thread with the suggestion to not to wait until > january orso before the vote. Hence my apologies for being impatient and > pessimistic about trunk :) > > Cheers, > >> Hey Julien, >> >> My option E was pretty much equivalent to B except I specified a time frame >> (next 6 months). Are we just saying that we'll accelerate the time frame >> to say, umm, next week or the week after? :) >> >> If so, fine by me. Since I moved nutchbase into the trunk at one point, I'd >> be happy once we've VOTEd and decided to be the one to execute moving it >> out. >> >> And yes, PMC votes will be binding and we'll do majority takes it, fine by >> me. >> >> Cheers, >> Chris >> >> On Sep 17, 2011, at 1:45 AM, Julien Nioche wrote: >>> Let's keep it simple. Let's vote for option B (i.e. shelve 2.0), if most >>> people are in favour then we don't need to look into other options at >>> all. If not, we'll see what alternatives or arguments come up and vote >>> on these later. >>> >>> I assume that only PMC votes will be binding and the majority takes it? >>> >>> Julien >>> >>> On 16 September 2011 22:30, Mattmann, Chris A (388J) >>> <[email protected]> wrote: Why don't we just collect VOTEs >>> for each of the options a-e, and then figure out based on that if there >>> is a majority. If there's no majority, we can widdle it down to say the >>> top 2-3, and then VOTE on those, looking for majority again. >>> >>> Cheers, >>> Chris >>> >>> On Sep 16, 2011, at 11:44 AM, Markus Jelsma wrote: >>>> Option B) Shelve trunk in a branch and promote 1.4 to trunk. We can >>>> always choose to hardwire HBASE (option D) later. >>>> >>>> Markus >>>> >>>>> Am happy to call for a vote on the future of Nutch 2.0 if you want. >>>>> Shall we reduce the various options described before to a single one? >>>>> >>>>> Julien >>>>> >>>>> On 15 September 2011 19:55, Markus Jelsma > <[email protected]>wrote: >>>>>>> Hi Guys, >>>>>>> >>>>>>> I thought I'd chime in on this thread. My comments below: >>>>>>>> I understand and share your frustration, however you need to bear >>>>>>>> in >>>>>> >>>>>> mind >>>>>> >>>>>>>> that things are done only if people volunteer and have time - >>>>>>>> usually taken from their holiday, weekends, evenings. Chris (who >>>>>>>> is the de >>>>>> >>>>>> facto >>>>>> >>>>>>>> release master for Nutch and Gora) has not had the time and nobody >>>>>>>> else has volunteered to do it. >>>>>>> >>>>>>> Yep I haven't had the time to push a Gora 0.1.1-incubating release >>>>>>> that will address the Maven issues. However it is on my roadmap for >>>>>>> open >>>>>> >>>>>> source >>>>>> >>>>>>> stuff to get done in the next month, so that's a good thing. But >>>>>>> yes, >>>>>> >>>>>> that >>>>>> >>>>>>> portion of my open source work is all volunteer time, so sometimes >>>>>>> other things take priority. >>>>>>> >>>>>>>>> As it happens, yesterday was the 1 year anniversary of the last >>>>>>>>> successful Hudson/Jenkins build... If that actually worked, we >>>>>>>>> could point people towards it as a useful recipe for how to get a >>>>>>>>> build working off trunk. I haven't been following Nutch too >>>>>>>>> closely, but it always strikes me as really odd, that there's a >>>>>>>>> nightly build and it doesn't bother anybody that it fails all the >>>>>>>>> time (and that there isn't a nightly build for the stable >>>>>>>>> branches). >>>>>>>> >>>>>>>> The real issue behind all this is what we should do with Nutch 2.0. >>>>>> >>>>>> What >>>>>> >>>>>>>> follows is only my opinion and I would love to hear what others >>>>>>>> have to say on this subject. >>>>>>>> >>>>>>>> Since we (actually mostly Dogacan) wrote 2.0 and delegated the >>>>>>>> storage >>>>>> >>>>>> to >>>>>> >>>>>>>> Gora, the latter hasn't really taken off since incubation. There >>>>>>>> have been some modest contributions to it but it does not seem to >>>>>>>> be used much and there is virtually nothing happening on it in >>>>>>>> terms of development. More worryingly, the people who initially >>>>>>>> contributed to >>>>>> >>>>>> it >>>>>> >>>>>>>> are not very active on the project (such is life, new jobs, >>>>>>>> different projects, etc...) anymore·. As for Nutch 2.0, it hasn't >>>>>>>> made any progress in the last 12 months : we still have the same >>>>>>>> bugs, the >>>>>> >>>>>> tests >>>>>> >>>>>>>> do not work, the build has to be done manually etc... >>>>>>> >>>>>>> Yep. >>>>>>> >>>>>>>> At the same time, there has been a new lease of life into Nutch as >>>>>>>> a whole : there is definitely more activity on the mailing lists, >>>>>>>> new users, new active committers etc... and quite a few bugfixes >>>>>>>> and improvements - most of them backported from what had been done >>>>>>>> in the trunk and people seem fairly happy with what we can do with >>>>>>>> 1.4 >>>>>>> >>>>>>> Totally agreed. I'm actually not super surprised -- ever since 1.1, >>>>>>> I >>>>>> >>>>>> kind >>>>>> >>>>>>> of felt that maintaining a stable 1.X branch of Nutch (in parallel >>>>>>> to the 2.0 efforts) was really going to pay off since there was >>>>>>> renewed interest from users in leveraging (and furthermore >>>>>>> accepting) the nuances of 1.X. >>>>>>> >>>>>>>> So the question is : what shall we do with 2.0? Here are a few >>>>>>>> possibilities >>>>>>>> >>>>>>>> >>>>>>>> a) put some effort into it, fix the bugs and make so that it can be >>>>>> >>>>>> used >>>>>> >>>>>>>> instead of 1.x >>>>>>>> b) shelve it and leave it for enthusiasts to play with + make 1.x >>>>>>>> the trunk again >>>>>>>> c) do nothing : keep 2.0 and 1.x in parallel (but having to >>>>>>>> maintain >>>>>> >>>>>> two >>>>>> >>>>>>>> branches is quite a pain) >>>>>>>> d) abandon the idea of a neutral storage layer with Gora and >>>>>>>> hardwire >>>>>> >>>>>> it >>>>>> >>>>>>>> to e.g. HBase >>>>>>>> >>>>>>>> Option (a) has not happened in the last 12 months and I am not very >>>>>>>> hopeful about it. >>>>>>>> >>>>>>>> What do you guys think? >>>>>>> >>>>>>> I'd suggest an option e). Evolve and keep releasing 1.X over the >>>>>>> next 6 months, and keep 2.0 in the trunk. After 6 months, see how >>>>>>> close 1.X is >>>>>> >>>>>> to >>>>>> >>>>>>> actually being 2.0 (e.g., did we release a 1.4, a 1.5, a 1.6?) If we >>>>>>> get to ~1.6 over the next 6 months and there is still no active >>>>>>> development >>>>>> >>>>>> on >>>>>> >>>>>>> 2.0, I'd propose we do this at that point in time: >>>>>>> >>>>>>> 1. branch the current trunk as >>>>>>> https://svn.apache.org/repos/asf/nutch/branches/nutchgora 2. grab >>>>>>> latest stable branch (e.g., >>>>>>> https://svn.apache.org/repos/asf/nutch/branches/branch-1.6) and >>>>>> >>>>>> *replace* >>>>>> >>>>>>> the Nutch trunk with it, and bump the version # to 1.7-dev 3. active >>>>>>> development on stable becomes active development in trunk and >>>>>>> nutchgora still exists in case anyone ever resurrects it. >>>>>>> >>>>>>> That way, we give another 6 months to see how it shakes out and >>>>>> >>>>>> potentially >>>>>> >>>>>>> allow for 1 or 2 or 3 more stable releases before switching those >>>>>>> over to trunk. >>>>>>> >>>>>>> Thoughts? >>>>>> >>>>>> Yes. I don't believe we should wait until january before discussing >>>>>> this topic >>>>>> again. I, for example, cannot spend considerable extra time on the >>>>>> issues i put in 1.4, also due to the fact that it's not entirely >>>>>> stable. >>>>>> >>>>>> There are many things i can write about this topic right now but >>>>>> don't feel it's neccessary. The choice is difficult and perhaps >>>>>> painful but when the voting round is opened by our project lead, i >>>>>> will vote for promoting 1.x back >>>>>> to trunk. >>>>>> >>>>>> My apologies for my impatience and pessimism. >>>>>> >>>>>>> BTW, I have a couple contributions from my CS572: Search Engines >>>>>>> class >>>>>> >>>>>> from >>>>>> >>>>>>> a year ago that I'd love to port into the Nutch stable branch >>>>>>> including Hubs/Authorities ranking and some other goodies. I'll try >>>>>>> and work on those over the next few months, I'm just letting >>>>>>> everyone know now so I don't forget again :-) >>>>>>> >>>>>>> Cheers, >>>>>>> Chris >>>>>>> >>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>>>> Chris Mattmann, Ph.D. >>>>>>> Senior Computer Scientist >>>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >>>>>>> Office: 171-266B, Mailstop: 171-246 >>>>>>> Email: [email protected] >>>>>>> WWW: http://sunset.usc.edu/~mattmann/ >>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>>>> Adjunct Assistant Professor, Computer Science Department >>>>>>> University of Southern California, Los Angeles, CA 90089 USA >>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> Chris Mattmann, Ph.D. >>> Senior Computer Scientist >>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >>> Office: 171-266B, Mailstop: 171-246 >>> Email: [email protected] >>> WWW: http://sunset.usc.edu/~mattmann/ >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> Adjunct Assistant Professor, Computer Science Department >>> University of Southern California, Los Angeles, CA 90089 USA >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Chris Mattmann, Ph.D. >> Senior Computer Scientist >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> Office: 171-266B, Mailstop: 171-246 >> Email: [email protected] >> WWW: http://sunset.usc.edu/~mattmann/ >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Adjunct Assistant Professor, Computer Science Department >> University of Southern California, Los Angeles, CA 90089 USA >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

