Re: [VOTE] Apache Nutch 1.1 Release Candidate #2
Might I suggest, that since Nutch is now a TLP that you delay this release by a few weeks and have the vote done under the auspices of the Nutch PMC? Cheers, Grant On Apr 26, 2010, at 1:55 AM, Mattmann, Chris A (388J) wrote: Hi Folks, I have posted an updated candidate for the Apache Nutch 1.1 release. The source code is at: http://people.apache.org/~mattmann/apache-nutch-1.1/rc2/ The major difference between this release and rc #1 is the application of NUTCH-812 - Crawl.java incorrectly uses the Generator API resulting in NPE - as well as some commits by Sami Siren to fix missing ASL license headers. For more detailed information, see the included CHANGES.txt file for details on release contents and latest changes. The release was made using the Nutch release process, documented on the Wiki here: http://bit.ly/d5ugid A Nutch 1.1 tag is at: http://svn.apache.org/repos/asf/lucene/nutch/tags/1.1/ note There was a request by Sami Siren that the tutorial be updated to reflect the fact that this release is a source-only release, as well as a request to integrate RAT into the build, however, in the interest of getting this 1.1 out and getting going on the Nutch TLP, my proposal is: * update the docs independent of this release (the tutorial as it exists right now says 0.7 on it anyways and doesn't look like it's been updated in a while, so I think users can live with what's there and support on u...@nutch.apache.org or d...@nutch.apache.org until it's updated) * begin source only releases in general since we've long had the debate as to the size of the Nutch release. Most folks that use Nutch are likely familiar with running ant IMHO. * run RAT and integrate into the build /note Please vote on releasing these packages as Apache Nutch 1.1. The vote is open for the next 72 hours. Since Nutch is now a TLP and has its own PMC, there is a question of who are the binding release VOTES in this particular thread. My gut reaction is that since I started this release while we were under the Lucene PMC, for continuity purposes, only votes from Lucene PMC are binding, but everyone (especially newly minted Nutch PMC members!) are welcome to check the release candidate and voice their approval or disapproval. The vote passes if at least three binding +1 votes are cast. [ ] +1 Release the packages as Apache Nutch 1.1. [ ] -1 Do not release the packages because... Thanks! Cheers, Chris P.S. Here is my +1. ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.mattm...@jpl.nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
Re: Running ANT; was -- Re: [VOTE] Apache Nutch 1.1 Release Candidate #2
Hi David, Thanks. In fact, running ant is probably simpler than running Nutch. The steps would be: * what OS are you on (Ant is available for all of them to my knowledge)? * if you need ant, grab a distro from ant.apache.org, otherwise, I'll assume that you've got ant installed and callable from the command line. * unpack the nutch src distribution, cd into that directory, type ant job, and there you go. HTH! You could try it out by taking the Nutch src code from SVN at: http://svn.apache.org/repos/asf/lucene/nutch/tags/1.1, and then trying the steps above. Cheers, Chris On 4/26/10 7:24 AM, David M. Cole d...@colegroup.com wrote: At 10:55 PM -0700 4/25/10, Mattmann, Chris A (388J) wrote: Most folks that use Nutch are likely familiar with running ant IMHO. I guess then I fall into the category of not most folks. Have been running Nutch for about 14 months and I haven't a clue how to run ant. If there's a place to vote to suggest that compiled versions still be distributed, I vote for that. Thanks. \dmc -- *+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+ David M. Coled...@colegroup.com Editor Publisher, NewsInc. http://newsinc.netV: (650) 557-2993 Consultant: The Cole Group http://colegroup.com/ F: (650) 475-8479 *+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+ ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.mattm...@jpl.nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
Re: [VOTE] Apache Nutch 1.1 Release Candidate #2
Hi Grant, Thanks. I think it actually makes sense to finish off 1.1, and since there is overlap with the Nutch PMC and the Lucene PMC and since the thread started in Lucene before the TLP, I think it would be great e.g., if Andrzej, and Sami could check the release and that way we still have the continuity and can safely push it out as the last Nutch rel under the Lucene umbrella... Then all releases post 1.1 can cleanly be done under the auspices of the new PMC :) Cheers, Chris On 4/26/10 5:34 AM, Grant Ignersoll gsing...@apache.org wrote: Might I suggest, that since Nutch is now a TLP that you delay this release by a few weeks and have the vote done under the auspices of the Nutch PMC? Cheers, Grant On Apr 26, 2010, at 1:55 AM, Mattmann, Chris A (388J) wrote: Hi Folks, I have posted an updated candidate for the Apache Nutch 1.1 release. The source code is at: http://people.apache.org/~mattmann/apache-nutch-1.1/rc2/ The major difference between this release and rc #1 is the application of NUTCH-812 - Crawl.java incorrectly uses the Generator API resulting in NPE - as well as some commits by Sami Siren to fix missing ASL license headers. For more detailed information, see the included CHANGES.txt file for details on release contents and latest changes. The release was made using the Nutch release process, documented on the Wiki here: http://bit.ly/d5ugid A Nutch 1.1 tag is at: http://svn.apache.org/repos/asf/lucene/nutch/tags/1.1/ note There was a request by Sami Siren that the tutorial be updated to reflect the fact that this release is a source-only release, as well as a request to integrate RAT into the build, however, in the interest of getting this 1.1 out and getting going on the Nutch TLP, my proposal is: * update the docs independent of this release (the tutorial as it exists right now says 0.7 on it anyways and doesn't look like it's been updated in a while, so I think users can live with what's there and support on u...@nutch.apache.org or d...@nutch.apache.org until it's updated) * begin source only releases in general since we've long had the debate as to the size of the Nutch release. Most folks that use Nutch are likely familiar with running ant IMHO. * run RAT and integrate into the build /note Please vote on releasing these packages as Apache Nutch 1.1. The vote is open for the next 72 hours. Since Nutch is now a TLP and has its own PMC, there is a question of who are the binding release VOTES in this particular thread. My gut reaction is that since I started this release while we were under the Lucene PMC, for continuity purposes, only votes from Lucene PMC are binding, but everyone (especially newly minted Nutch PMC members!) are welcome to check the release candidate and voice their approval or disapproval. The vote passes if at least three binding +1 votes are cast. [ ] +1 Release the packages as Apache Nutch 1.1. [ ] -1 Do not release the packages because... Thanks! Cheers, Chris P.S. Here is my +1. ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.mattm...@jpl.nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.mattm...@jpl.nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
Re: [VOTE] Apache Nutch 1.1 Release Candidate #2
Hey Andrzej, Okey dokey, np! Let's get the patch in first :) I can cut as many RCs as needed. Cheers, Chris On 4/26/10 11:30 AM, Andrzej Bialecki a...@getopt.org wrote: On 2010-04-26 17:19, Mattmann, Chris A (388J) wrote: Hi Grant, Thanks. I think it actually makes sense to finish off 1.1, and since there is overlap with the Nutch PMC and the Lucene PMC and since the thread started in Lucene before the TLP, I think it would be great e.g., if Andrzej, and Sami could check the release and that way we still have the continuity and can safely push it out as the last Nutch rel under the Lucene umbrella... Then all releases post 1.1 can cleanly be done under the auspices of the new PMC :) I know that Dennis Kubes just discovered a bug in SegmentMerger (he may report on it in a moment) - this bug has been there for a while, it's likely the cause of the mysterious out of disk space errors, and it manifests itself only with input files larger than HDFS block size (64MB). Since 1.1 is likely the final release of Nutch 1.x I think it would make sense to fix this bug before we release ... -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.mattm...@jpl.nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
[VOTE] Apache Nutch 1.1 Release Candidate #2
Hi Folks, I have posted an updated candidate for the Apache Nutch 1.1 release. The source code is at: http://people.apache.org/~mattmann/apache-nutch-1.1/rc2/ The major difference between this release and rc #1 is the application of NUTCH-812 - Crawl.java incorrectly uses the Generator API resulting in NPE - as well as some commits by Sami Siren to fix missing ASL license headers. For more detailed information, see the included CHANGES.txt file for details on release contents and latest changes. The release was made using the Nutch release process, documented on the Wiki here: http://bit.ly/d5ugid A Nutch 1.1 tag is at: http://svn.apache.org/repos/asf/lucene/nutch/tags/1.1/ note There was a request by Sami Siren that the tutorial be updated to reflect the fact that this release is a source-only release, as well as a request to integrate RAT into the build, however, in the interest of getting this 1.1 out and getting going on the Nutch TLP, my proposal is: * update the docs independent of this release (the tutorial as it exists right now says 0.7 on it anyways and doesn't look like it's been updated in a while, so I think users can live with what's there and support on u...@nutch.apache.org or d...@nutch.apache.org until it's updated) * begin source only releases in general since we've long had the debate as to the size of the Nutch release. Most folks that use Nutch are likely familiar with running ant IMHO. * run RAT and integrate into the build /note Please vote on releasing these packages as Apache Nutch 1.1. The vote is open for the next 72 hours. Since Nutch is now a TLP and has its own PMC, there is a question of who are the binding release VOTES in this particular thread. My gut reaction is that since I started this release while we were under the Lucene PMC, for continuity purposes, only votes from Lucene PMC are binding, but everyone (especially newly minted Nutch PMC members!) are welcome to check the release candidate and voice their approval or disapproval. The vote passes if at least three binding +1 votes are cast. [ ] +1 Release the packages as Apache Nutch 1.1. [ ] -1 Do not release the packages because... Thanks! Cheers, Chris P.S. Here is my +1. ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.mattm...@jpl.nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++