+1 to the description w/o experimental too (I agree with Ferdy). You guys ROCK.
Cheers, Chris On Jun 13, 2012, at 5:29 AM, Lewis John Mcgibbney wrote: > Hi, > > Seeing as we have the ball rolling with the 2.0 RC. I thought I'd ask > about a suitable project descriptor. > > So far on trunk we have > > ** Apache Nutch is an open source web-search software project. > Stemming from Apache Lucene, it now builds on Apache Solr adding > web-specifics, such as a crawler, a link-graph database and parsing > support handled by Apache Tika for HTML and and array other document > formats. > > This is merely a pot shot, but I was thinking for Nutch 2.0, something like > > ** Apache Nutch 2.X is an experimental branch of the Apache Nutch open > source web-search software project. It builds on Apache Gora for data > persistence and Apache Solr for indexing adding web-specifics, such as > a crawler, a link-graph database and parsing support handled by Apache > Tika for HTML and and array other document formats. > > Although there are not many changes here I just wanted to run it by > you folks...? > > Thanks > Lewis > > -- > Lewis ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++