Re: [VOTE] Apache Nutch 1.1 Release Candidate #2

2010-04-26 Thread Grant Ingersoll
Might I suggest, that since Nutch is now a TLP that you delay this release by a 
few weeks and have the vote done under the auspices of the Nutch PMC?

Cheers,
Grant

On Apr 26, 2010, at 1:55 AM, Mattmann, Chris A (388J) wrote:

 Hi Folks,
 
 I have posted an updated candidate for the Apache Nutch 1.1 release. The
 source code is at:
 
 http://people.apache.org/~mattmann/apache-nutch-1.1/rc2/
 
 The major difference between this release and rc #1 is the application of
 NUTCH-812 - Crawl.java incorrectly uses the Generator API resulting in NPE -
 as well as some commits by Sami Siren to fix missing ASL license headers.
 
 For more detailed information, see the included CHANGES.txt file for details
 on release contents and latest changes. The release was made using the Nutch
 release process, documented on the Wiki here:
 
 http://bit.ly/d5ugid
 
 A Nutch 1.1 tag is at:
 
 http://svn.apache.org/repos/asf/lucene/nutch/tags/1.1/
 
 note
 There was a request by Sami Siren that the tutorial be updated to reflect
 the fact that this release is a source-only release, as well as a request to
 integrate RAT into the build, however, in the interest of getting this 1.1
 out and getting going on the Nutch TLP, my proposal is:
 
 * update the docs independent of this release (the tutorial as it exists
 right now says 0.7 on it anyways and doesn't look like it's been updated in
 a while, so I think users can live with what's there and support on
 u...@nutch.apache.org or d...@nutch.apache.org until it's updated)
 
 * begin source only releases in general since we've long had the debate as
 to the size of the Nutch release. Most folks that use Nutch are likely
 familiar with running ant IMHO.
 
 * run RAT and integrate into the build
 
 /note
 
 Please vote on releasing these packages as Apache Nutch 1.1. The vote is
 open for the next 72 hours.
 
 Since Nutch is now a TLP and has its own PMC, there is a question of who are
 the binding release VOTES in this particular thread. My gut reaction is that
 since I started this release while we were under the Lucene PMC, for
 continuity purposes, only votes from Lucene PMC are binding, but everyone
 (especially newly minted Nutch PMC members!) are  welcome to check the
 release candidate and voice their approval or disapproval. The vote passes
 if at least three binding +1 votes are cast.
 
 [ ] +1 Release the packages as Apache Nutch 1.1.
 
 [ ] -1 Do not release the packages because...
 
 Thanks!
 
 Cheers,
 Chris
 
 P.S. Here is my +1.
 
 ++
 Chris Mattmann, Ph.D.
 Senior Computer Scientist
 NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
 Office: 171-266B, Mailstop: 171-246
 Email: chris.mattm...@jpl.nasa.gov
 WWW:   http://sunset.usc.edu/~mattmann/
 ++
 Adjunct Assistant Professor, Computer Science Department
 University of Southern California, Los Angeles, CA 90089 USA
 ++
 
 
 




Re: Running ANT; was -- Re: [VOTE] Apache Nutch 1.1 Release Candidate #2

2010-04-26 Thread Mattmann, Chris A (388J)
Hi David,

Thanks. In fact, running ant is probably simpler than running Nutch. The steps 
would be:


 *   what OS are you on (Ant is available for all of them to my knowledge)?
 *   if you need ant, grab a distro from ant.apache.org, otherwise, I'll assume 
that you've got ant installed and callable from the command line.
 *   unpack the nutch src distribution, cd into that directory, type ant job, 
and there you go.

HTH! You could try it out by taking the Nutch src code from SVN at: 
http://svn.apache.org/repos/asf/lucene/nutch/tags/1.1, and then trying the 
steps above.

Cheers,
Chris


On 4/26/10 7:24 AM, David M. Cole d...@colegroup.com wrote:

At 10:55 PM -0700 4/25/10, Mattmann, Chris A (388J) wrote:
Most folks that use Nutch are likely
familiar with running ant IMHO.

I guess then I fall into the category of not most folks. Have been
running Nutch for about 14 months and I haven't a clue how to run ant.

If there's a place to vote to suggest that compiled versions still be
distributed, I vote for that.

Thanks.

\dmc

--
*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+
David M. Coled...@colegroup.com
Editor  Publisher, NewsInc. http://newsinc.netV: (650) 557-2993
Consultant: The Cole Group http://colegroup.com/   F: (650) 475-8479
*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+



++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.mattm...@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: [VOTE] Apache Nutch 1.1 Release Candidate #2

2010-04-26 Thread Mattmann, Chris A (388J)
Hi Grant,

Thanks. I think it actually makes sense to finish off 1.1, and since there is 
overlap with the Nutch PMC and the Lucene PMC and since the thread started in 
Lucene before the TLP, I think it would be great e.g., if Andrzej, and Sami 
could check the release and that way we still have the continuity and can 
safely push it out as the last Nutch rel under the Lucene umbrella...

Then all releases post 1.1 can cleanly be done under the auspices of the new 
PMC :)

Cheers,
Chris


On 4/26/10 5:34 AM, Grant Ignersoll gsing...@apache.org wrote:

Might I suggest, that since Nutch is now a TLP that you delay this release by a 
few weeks and have the vote done under the auspices of the Nutch PMC?

Cheers,
Grant

On Apr 26, 2010, at 1:55 AM, Mattmann, Chris A (388J) wrote:

 Hi Folks,

 I have posted an updated candidate for the Apache Nutch 1.1 release. The
 source code is at:

 http://people.apache.org/~mattmann/apache-nutch-1.1/rc2/

 The major difference between this release and rc #1 is the application of
 NUTCH-812 - Crawl.java incorrectly uses the Generator API resulting in NPE -
 as well as some commits by Sami Siren to fix missing ASL license headers.

 For more detailed information, see the included CHANGES.txt file for details
 on release contents and latest changes. The release was made using the Nutch
 release process, documented on the Wiki here:

 http://bit.ly/d5ugid

 A Nutch 1.1 tag is at:

 http://svn.apache.org/repos/asf/lucene/nutch/tags/1.1/

 note
 There was a request by Sami Siren that the tutorial be updated to reflect
 the fact that this release is a source-only release, as well as a request to
 integrate RAT into the build, however, in the interest of getting this 1.1
 out and getting going on the Nutch TLP, my proposal is:

 * update the docs independent of this release (the tutorial as it exists
 right now says 0.7 on it anyways and doesn't look like it's been updated in
 a while, so I think users can live with what's there and support on
 u...@nutch.apache.org or d...@nutch.apache.org until it's updated)

 * begin source only releases in general since we've long had the debate as
 to the size of the Nutch release. Most folks that use Nutch are likely
 familiar with running ant IMHO.

 * run RAT and integrate into the build

 /note

 Please vote on releasing these packages as Apache Nutch 1.1. The vote is
 open for the next 72 hours.

 Since Nutch is now a TLP and has its own PMC, there is a question of who are
 the binding release VOTES in this particular thread. My gut reaction is that
 since I started this release while we were under the Lucene PMC, for
 continuity purposes, only votes from Lucene PMC are binding, but everyone
 (especially newly minted Nutch PMC members!) are  welcome to check the
 release candidate and voice their approval or disapproval. The vote passes
 if at least three binding +1 votes are cast.

 [ ] +1 Release the packages as Apache Nutch 1.1.

 [ ] -1 Do not release the packages because...

 Thanks!

 Cheers,
 Chris

 P.S. Here is my +1.

 ++
 Chris Mattmann, Ph.D.
 Senior Computer Scientist
 NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
 Office: 171-266B, Mailstop: 171-246
 Email: chris.mattm...@jpl.nasa.gov
 WWW:   http://sunset.usc.edu/~mattmann/
 ++
 Adjunct Assistant Professor, Computer Science Department
 University of Southern California, Los Angeles, CA 90089 USA
 ++








++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.mattm...@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: [VOTE] Apache Nutch 1.1 Release Candidate #2

2010-04-26 Thread Mattmann, Chris A (388J)
Hey Andrzej,

Okey dokey, np! Let's get the patch in first :) I can cut as many RCs as needed.

Cheers,
Chris

On 4/26/10 11:30 AM, Andrzej Bialecki a...@getopt.org wrote:

On 2010-04-26 17:19, Mattmann, Chris A (388J) wrote:
 Hi Grant,

 Thanks. I think it actually makes sense to finish off 1.1, and since there is 
 overlap with the Nutch PMC and the Lucene PMC and since the thread started in 
 Lucene before the TLP, I think it would be great e.g., if Andrzej, and Sami 
 could check the release and that way we still have the continuity and can 
 safely push it out as the last Nutch rel under the Lucene umbrella...

 Then all releases post 1.1 can cleanly be done under the auspices of the new 
 PMC :)

I know that Dennis Kubes just discovered a bug in SegmentMerger (he may
report on it in a moment) - this bug has been there for a while, it's
likely the cause of the mysterious out of disk space errors, and it
manifests itself only with input files larger than HDFS block size
(64MB). Since 1.1 is likely the final release of Nutch 1.x I think it
would make sense to fix this bug before we release ...

--
Best regards,
Andrzej Bialecki 
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com




++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.mattm...@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



[VOTE] Apache Nutch 1.1 Release Candidate #2

2010-04-25 Thread Mattmann, Chris A (388J)
Hi Folks,

I have posted an updated candidate for the Apache Nutch 1.1 release. The
source code is at:

http://people.apache.org/~mattmann/apache-nutch-1.1/rc2/

The major difference between this release and rc #1 is the application of
NUTCH-812 - Crawl.java incorrectly uses the Generator API resulting in NPE -
as well as some commits by Sami Siren to fix missing ASL license headers.

For more detailed information, see the included CHANGES.txt file for details
on release contents and latest changes. The release was made using the Nutch
release process, documented on the Wiki here:

http://bit.ly/d5ugid

A Nutch 1.1 tag is at:

http://svn.apache.org/repos/asf/lucene/nutch/tags/1.1/

note
There was a request by Sami Siren that the tutorial be updated to reflect
the fact that this release is a source-only release, as well as a request to
integrate RAT into the build, however, in the interest of getting this 1.1
out and getting going on the Nutch TLP, my proposal is:

* update the docs independent of this release (the tutorial as it exists
right now says 0.7 on it anyways and doesn't look like it's been updated in
a while, so I think users can live with what's there and support on
u...@nutch.apache.org or d...@nutch.apache.org until it's updated)

* begin source only releases in general since we've long had the debate as
to the size of the Nutch release. Most folks that use Nutch are likely
familiar with running ant IMHO.

* run RAT and integrate into the build

/note

Please vote on releasing these packages as Apache Nutch 1.1. The vote is
open for the next 72 hours.

Since Nutch is now a TLP and has its own PMC, there is a question of who are
the binding release VOTES in this particular thread. My gut reaction is that
since I started this release while we were under the Lucene PMC, for
continuity purposes, only votes from Lucene PMC are binding, but everyone
(especially newly minted Nutch PMC members!) are  welcome to check the
release candidate and voice their approval or disapproval. The vote passes
if at least three binding +1 votes are cast.

[ ] +1 Release the packages as Apache Nutch 1.1.

[ ] -1 Do not release the packages because...

Thanks!

Cheers,
Chris

P.S. Here is my +1.

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.mattm...@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++