Hmm please ignore "the parse text limited to 100 chars", this is actually not the case. (Only in our branch that has a fix for limiting anchor texts; not yet present in in the nutchgora branch because it still needs polishing). So no need to wait for commits on my part.
On Wed, Jun 13, 2012 at 11:00 AM, Ferdy Galema <[email protected]>wrote: > Findings about Nutch-2.0 RC 1. > > The Nutch job jar is not present in the binary archive. This means > distributed running of jobs is not supported. I'm not sure if this is a > problem (since users can always build one themselves), merely pointing it > out. The recently released 1.5 also lacks this job jar, so at least no > difference there. > > Parse text is limited to 100 characters for html. We noticed this when our > index wasn't showing enough terms for some documents. This is a pretty > severe bug that I will commit a fix for right away. > > Building runtime with the default SqlStore and HBaseStore works fine. Will > perform some more functionality tests when there is a new RC. > > Ferdy. > > On Wed, Jun 13, 2012 at 4:24 AM, Mattmann, Chris A (388J) < > [email protected]> wrote: > >> Hey Guys, >> >> #2 is probably reason enough for a respin. >> >> Lewis if you don't have time to do it before Thursday, I could probably >> give it a whack. Let me know. >> >> Cheers, >> Chris >> >> On Jun 12, 2012, at 3:33 PM, Sebastian Nagel wrote: >> >> > Hi Lewis, >> > >> > my first steps with 2.0 (to be continued, still struggling). >> > >> > Two points (I'll try to give a final vote tomorrow): >> > >> > 1 some guidance would be nice. README.txt points >> > to http://wiki.apache.org/nutch/NutchTutorial which refers to 1.x >> > (I'm using >> http://sujitpal.blogspot.de/2012/01/exploring-nutch-gora-with-cassandra.html >> ) >> > >> > 2 the package contains your nutch-site.xml: >> > <name>http.agent.email</name> >> > <value>[email protected]</value> >> > I guess that's not intended :) >> > >> > Cheers, >> > Sebastian >> > >> > On 06/12/2012 10:16 PM, Lewis John Mcgibbney wrote: >> >> Hi Everyone, >> >> >> >> I appreciate that most of the core dev's are using trunk, however I >> >> would appeal to you guys to at least check out the artifacts and check >> >> sigs, tests, license headers if possible. Although this does not fully >> >> satisfy the requirements of a thoroughly reviewed RC, hopefully the >> >> thorough stuff can be undertaken by those directly using the artifacts >> >> and code in development/production. >> >> >> >> Thanks very much in advance >> >> >> >> Best >> >> >> >> Lewis >> >> >> >> On Fri, Jun 8, 2012 at 3:49 PM, lewis john mcgibbney < >> [email protected]> wrote: >> >>> Good Evening Everyone, >> >>> >> >>> A candidate for the Apache Nutch 2.0 RC1 is available at: >> >>> >> >>> http://people.apache.org/~lewismc/nutch-2.0 >> >>> >> >>> The release candidate is a src.zip, bin.zip, src.tar.gz and bin.tar.gz >> >>> archive of the sources in: >> >>> >> >>> http://svn.apache.org/repos/asf/nutch/tags/release-2.0rc1 >> >>> >> >>> Further, a staged Maven repository of the 2.0 jar, sources.jar and >> >>> javadoc.jar is available here: >> >>> >> >>> https://repository.apache.org/content/repositories/orgapachenutch-215 >> >>> >> >>> Please vote on releasing this package as Apache Nutch 2.0. >> >>> The vote is open for the next 72 hours and passes if a majority of at >> >>> least three +1 Nutch PMC votes are cast. >> >>> >> >>> [ ] +1 Release this package as Apache Nutch 2.0 >> >>> [ ] -1 Do not release this package because... >> >>> >> >>> Many Thanks and heres to plenty more. >> >>> >> >>> Have a great weekend, Kind Regards, >> >>> Lewis >> >>> >> >>> P.S. Here's my +1. >> >> >> >> >> >> >> > >> >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Chris Mattmann, Ph.D. >> Senior Computer Scientist >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> Office: 171-266B, Mailstop: 171-246 >> Email: [email protected] >> WWW: http://sunset.usc.edu/~mattmann/ >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Adjunct Assistant Professor, Computer Science Department >> University of Southern California, Los Angeles, CA 90089 USA >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> >

