> Or, if you have experience with JSPs/GUI work, then I think there's this > big open issue around improving the Nutch GUI, which would likely provide > the most benefit to the most users. I haven't been following the current > status, but I know that there have been periodic discussions, and I think > 101tec did some work on this a while back (for a client), but I don't know > if that's been contributed (or could be, for that matter). >
A related issue is porting the REST-API from nutchgora to trunk ( https://issues.apache.org/jira/browse/NUTCH-880) which in turn could be used by a GUI J. > > -- Ken > > On Jan 21, 2012, at 8:17am, Edward Drapkin wrote: > > On 1/21/2012 8:27 AM, Lewis John Mcgibbney wrote: > > Hi Julien, > > > There are 8 issues in trunk about the fetcher - some of them unrelated >> to the Fetcher (NUTCH-827<https://issues.apache.org/jira/browse/NUTCH-827>/ >> Nutch-1193) with most of the others being improvements ( >> NUTCH-828 <https://issues.apache.org/jira/browse/NUTCH-828> / >> NUTCH-1079<https://issues.apache.org/jira/browse/NUTCH-1079>) >> with possibly just a very few being real issues. > > > This puts the whole discussion into much better context, thanks for > pointing this out. Maybe I should have made it more clear, that I only > filtered the fetcher issues on our Jira and I was simply modelling my > discussion around that. You are completely correct though, it would be > different if the fetcher was in a similar state to protocol-httpclient... > which it is obviously not. > > >> I am also concerned about getting too radical changes to such a core part >> of the framework, especially when more pressing issues could be looked >> after instead. > > +1 > > >> Having said that if someone can come up with an interesting proposal for >> improving the Fetcher that would be very good, I would simply suggest that >> we then have a separate implementation for that. >> > +1 > > >> >> >> Ok with this in mind then, is there some guidance we can communicate to > Eddie? He has specifically mentioned that he shares similar opinions wrt > the fetcher being a core part of Nutch, radical changes etc, and I also > share this point of view. He has also added that he doesn't want to spend > the time changing material which we may or may not merge with trunk, this > also makes perfect sense. Additionally Ken's comments emphasise that this > has been somewhat attempted in the past and that lessons have been learned > and the implementation we have cuts the mustard as is. > Maybe we could nudge Eddie in the right direction, which would benefit > both himself and the project over the next while, I think this was the most > important point I was trying to emphasise, however looking over my original > comment this was maybe not how it was written. > > Thanks > Lewis > > > If there's more important and/or interesting things for me to work on, > I'll be glad to. I'm completely unfamiliar with the current state of the > project as a whole - and looking through JIRA is a bit daunting. The only > reason I'm attracted to working on the fetcher is I think it's a really > interesting and compelling problem to solve, and it's making it more > flexible is something that would directly benefit our use for it, so it > will be easier to devote time to it while I'm at the office. I do have a > glut of free time at the moment though, so I'm perfectly okay working on > another area that's more pressing - I just don't know what it is. I saw > that protocol-httpclient needs to be rewritten, is there someone working on > that? > > I can work on more important and less controversial / radical things, but > I do think that having a more flexible, pluggable fetcher will be an > enormous improvement to Nutch and can greatly expand the potential uses for > it as a piece of software. There's a ton of cases where pluggable fetching > could have a huge improvement: local filesystem search, single-threaded / > small site indexing, email indexing (SMTP, POP, etc.), etc. I suggested an > extremely (perhaps too much so) abstract archtecture for fetching in ticket > #1201, and for the sake of brevity I won't repeat myself here, but I think > that would give Nutch a good base for flexible fetching, which I believe is > a huge improvement to the project. I'm obviously new to the development > here and I'm willing do whatever needs doing, I just believe the fetching > is something that needs doing. I just want to contribute! > > Thanks, > Eddie > > > -------------------------- > Ken Krugler > http://www.scaleunlimited.com > custom big data solutions & training > Hadoop, Cascading, Mahout & Solr > > > > > -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com

