Dear Nutch users & developers,
Thank you for the warm welcome. I guess I'm now part of the family. I hope it will grow exponentially with the new version. My Nutch story started in 2007 but only lasted for a few months. I resumed it recently in November 2010 through an exchange of comments on Julien's blog, about whether or not using Nuch 1.2 of Nutch 2 (trunk) for my personal purpose. The new design he suggested has shifted radically from a full-fledged solution for search application, to a minimalistic project that does not do indexing, neither storing, neither parsing, but just crawling. Delegating all the subsidiary tasks to more specialized projects should allow the Nutch community to focus on it's core activity: Downloading pages from the web automatically the fastest way and preparing the data for analysis, still respecting the web standards regarding robots. I take advantage of this announcement to urge all new and more familiar users to migrate their crawls to this 2.0 version, even though it is still in a very alpha version. It works, provided you apply a few patches here and there. Help will be very much appreciated, especially in helping kickstart with Gora, an embryonic project for Data Access in Map/Reduce. IMHO, what's high-priority on the road map would be: - Setup an Ivy configuration to build the first Gora release. Currently Nutch build fails because of the missing Gora dependency in the Maven repository. - Port http-protocol plugin that fetches content from the web to HttpComponents' httpcore-nio in order to leverage Non blocking I/O. - Design and improve Gora & Nutch unit tests. Don't hesitate to share your own impressions on the new design, the road map, the potential improvements. If you wish to participate please refer to Nutch 2.0 section in the wiki. There are many ways to contribute: send a message on the mailing-list, create an issue on JIRA while attaching your patch to it or not, update the wiki... Give it a shot! Alexis http://techvineyard.blogspot.com On Tue, Feb 15, 2011 at 6:00 PM, Markus Jelsma <[email protected]> wrote: > Great! > > On Tuesday 15 February 2011 17:49:40 Mattmann, Chris A (388J) wrote: >> Hi Folks, >> >> A while back I nominated Alexis Detreglode for Nutch committership and PMC >> membership. The VOTE tallies in Nutch PMC-ville have occurred and I'm happy >> to announce that Alexis is now an Nutch committer! >> >> Alexis, feel free to say a little bit about yourself, and, welcome aboard! >> >> Cheers, >> Chris >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Chris Mattmann, Ph.D. >> Senior Computer Scientist >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> Office: 171-266B, Mailstop: 171-246 >> Email: [email protected] >> WWW: http://sunset.usc.edu/~mattmann/ >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Adjunct Assistant Professor, Computer Science Department >> University of Southern California, Los Angeles, CA 90089 USA >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > -- > Markus Jelsma - CTO - Openindex > http://www.linkedin.com/in/markus17 > 050-8536620 / 06-50258350 >

