[VOTE] Apache Nutch 1.1 Release Candidate #3

2010-05-08 Thread Mattmann, Chris A (388J)
Hi Folks,

I have posted an updated candidate for the Apache Nutch 1.1 release. The
source code is at:

http://people.apache.org/~mattmann/apache-nutch-1.1/rc3/

The major differences between this release and rc #2 are the application of:
NUTCH-816, NUTCH-732, NUTCH-815, NUTCH-814, and NUTCH-812 based on feedback
from prior release candidates.

For more detailed information, see the included CHANGES.txt file for details
on release contents and latest changes. The release was made using the Nutch
release process, documented on the Wiki here:

http://bit.ly/d5ugid

A Nutch 1.1 tag is at:

http://svn.apache.org/repos/asf/lucene/nutch/tags/1.1/

note
In response to several user requests during the last RC cycle, I've also
included *binary* releases (labeled as apache-nutch-1.1-bin.tar.gz and
apache-nutch-1.1-bin.zip). This addresses Sami Siren's request that the
tutorial be updated to reflect the fact that this release is a source-only
release.

Sami also requested to integrate RAT into the build, however, in the
interest of getting this 1.1 out and getting going on the Nutch TLP, my
proposal is:

* run RAT and integrate into the build on releases post 1.1

/note

Please vote on releasing these packages as Apache Nutch 1.1. The vote is
open for the next 72 hours.

Only votes from Nutch PMC are binding, but folks are welcome to check the
release candidate and voice their approval or disapproval. The vote passes
if at least three binding +1 votes are cast.

[ ] +1 Release the packages as Apache Nutch 1.1.

[ ] -1 Do not release the packages because...

Thanks!

Cheers,
Chris

P.S. Here is my +1.

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.mattm...@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++




Re: Running ANT; was -- Re: [VOTE] Apache Nutch 1.1 Release Candidate #2

2010-04-26 Thread Mattmann, Chris A (388J)
Hi David,

Thanks. In fact, running ant is probably simpler than running Nutch. The steps 
would be:


 *   what OS are you on (Ant is available for all of them to my knowledge)?
 *   if you need ant, grab a distro from ant.apache.org, otherwise, I'll assume 
that you've got ant installed and callable from the command line.
 *   unpack the nutch src distribution, cd into that directory, type ant job, 
and there you go.

HTH! You could try it out by taking the Nutch src code from SVN at: 
http://svn.apache.org/repos/asf/lucene/nutch/tags/1.1, and then trying the 
steps above.

Cheers,
Chris


On 4/26/10 7:24 AM, David M. Cole d...@colegroup.com wrote:

At 10:55 PM -0700 4/25/10, Mattmann, Chris A (388J) wrote:
Most folks that use Nutch are likely
familiar with running ant IMHO.

I guess then I fall into the category of not most folks. Have been
running Nutch for about 14 months and I haven't a clue how to run ant.

If there's a place to vote to suggest that compiled versions still be
distributed, I vote for that.

Thanks.

\dmc

--
*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+
David M. Coled...@colegroup.com
Editor  Publisher, NewsInc. http://newsinc.netV: (650) 557-2993
Consultant: The Cole Group http://colegroup.com/   F: (650) 475-8479
*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+



++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.mattm...@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: [VOTE] Apache Nutch 1.1 Release Candidate #2

2010-04-26 Thread Mattmann, Chris A (388J)
Hi Grant,

Thanks. I think it actually makes sense to finish off 1.1, and since there is 
overlap with the Nutch PMC and the Lucene PMC and since the thread started in 
Lucene before the TLP, I think it would be great e.g., if Andrzej, and Sami 
could check the release and that way we still have the continuity and can 
safely push it out as the last Nutch rel under the Lucene umbrella...

Then all releases post 1.1 can cleanly be done under the auspices of the new 
PMC :)

Cheers,
Chris


On 4/26/10 5:34 AM, Grant Ignersoll gsing...@apache.org wrote:

Might I suggest, that since Nutch is now a TLP that you delay this release by a 
few weeks and have the vote done under the auspices of the Nutch PMC?

Cheers,
Grant

On Apr 26, 2010, at 1:55 AM, Mattmann, Chris A (388J) wrote:

 Hi Folks,

 I have posted an updated candidate for the Apache Nutch 1.1 release. The
 source code is at:

 http://people.apache.org/~mattmann/apache-nutch-1.1/rc2/

 The major difference between this release and rc #1 is the application of
 NUTCH-812 - Crawl.java incorrectly uses the Generator API resulting in NPE -
 as well as some commits by Sami Siren to fix missing ASL license headers.

 For more detailed information, see the included CHANGES.txt file for details
 on release contents and latest changes. The release was made using the Nutch
 release process, documented on the Wiki here:

 http://bit.ly/d5ugid

 A Nutch 1.1 tag is at:

 http://svn.apache.org/repos/asf/lucene/nutch/tags/1.1/

 note
 There was a request by Sami Siren that the tutorial be updated to reflect
 the fact that this release is a source-only release, as well as a request to
 integrate RAT into the build, however, in the interest of getting this 1.1
 out and getting going on the Nutch TLP, my proposal is:

 * update the docs independent of this release (the tutorial as it exists
 right now says 0.7 on it anyways and doesn't look like it's been updated in
 a while, so I think users can live with what's there and support on
 u...@nutch.apache.org or d...@nutch.apache.org until it's updated)

 * begin source only releases in general since we've long had the debate as
 to the size of the Nutch release. Most folks that use Nutch are likely
 familiar with running ant IMHO.

 * run RAT and integrate into the build

 /note

 Please vote on releasing these packages as Apache Nutch 1.1. The vote is
 open for the next 72 hours.

 Since Nutch is now a TLP and has its own PMC, there is a question of who are
 the binding release VOTES in this particular thread. My gut reaction is that
 since I started this release while we were under the Lucene PMC, for
 continuity purposes, only votes from Lucene PMC are binding, but everyone
 (especially newly minted Nutch PMC members!) are  welcome to check the
 release candidate and voice their approval or disapproval. The vote passes
 if at least three binding +1 votes are cast.

 [ ] +1 Release the packages as Apache Nutch 1.1.

 [ ] -1 Do not release the packages because...

 Thanks!

 Cheers,
 Chris

 P.S. Here is my +1.

 ++
 Chris Mattmann, Ph.D.
 Senior Computer Scientist
 NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
 Office: 171-266B, Mailstop: 171-246
 Email: chris.mattm...@jpl.nasa.gov
 WWW:   http://sunset.usc.edu/~mattmann/
 ++
 Adjunct Assistant Professor, Computer Science Department
 University of Southern California, Los Angeles, CA 90089 USA
 ++








++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.mattm...@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: [VOTE] Apache Nutch 1.1 Release Candidate #2

2010-04-26 Thread Mattmann, Chris A (388J)
Hey Andrzej,

Okey dokey, np! Let's get the patch in first :) I can cut as many RCs as needed.

Cheers,
Chris

On 4/26/10 11:30 AM, Andrzej Bialecki a...@getopt.org wrote:

On 2010-04-26 17:19, Mattmann, Chris A (388J) wrote:
 Hi Grant,

 Thanks. I think it actually makes sense to finish off 1.1, and since there is 
 overlap with the Nutch PMC and the Lucene PMC and since the thread started in 
 Lucene before the TLP, I think it would be great e.g., if Andrzej, and Sami 
 could check the release and that way we still have the continuity and can 
 safely push it out as the last Nutch rel under the Lucene umbrella...

 Then all releases post 1.1 can cleanly be done under the auspices of the new 
 PMC :)

I know that Dennis Kubes just discovered a bug in SegmentMerger (he may
report on it in a moment) - this bug has been there for a while, it's
likely the cause of the mysterious out of disk space errors, and it
manifests itself only with input files larger than HDFS block size
(64MB). Since 1.1 is likely the final release of Nutch 1.x I think it
would make sense to fix this bug before we release ...

--
Best regards,
Andrzej Bialecki 
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com




++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.mattm...@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



[VOTE] Apache Nutch 1.1 Release Candidate #2

2010-04-25 Thread Mattmann, Chris A (388J)
Hi Folks,

I have posted an updated candidate for the Apache Nutch 1.1 release. The
source code is at:

http://people.apache.org/~mattmann/apache-nutch-1.1/rc2/

The major difference between this release and rc #1 is the application of
NUTCH-812 - Crawl.java incorrectly uses the Generator API resulting in NPE -
as well as some commits by Sami Siren to fix missing ASL license headers.

For more detailed information, see the included CHANGES.txt file for details
on release contents and latest changes. The release was made using the Nutch
release process, documented on the Wiki here:

http://bit.ly/d5ugid

A Nutch 1.1 tag is at:

http://svn.apache.org/repos/asf/lucene/nutch/tags/1.1/

note
There was a request by Sami Siren that the tutorial be updated to reflect
the fact that this release is a source-only release, as well as a request to
integrate RAT into the build, however, in the interest of getting this 1.1
out and getting going on the Nutch TLP, my proposal is:

* update the docs independent of this release (the tutorial as it exists
right now says 0.7 on it anyways and doesn't look like it's been updated in
a while, so I think users can live with what's there and support on
u...@nutch.apache.org or d...@nutch.apache.org until it's updated)

* begin source only releases in general since we've long had the debate as
to the size of the Nutch release. Most folks that use Nutch are likely
familiar with running ant IMHO.

* run RAT and integrate into the build

/note

Please vote on releasing these packages as Apache Nutch 1.1. The vote is
open for the next 72 hours.

Since Nutch is now a TLP and has its own PMC, there is a question of who are
the binding release VOTES in this particular thread. My gut reaction is that
since I started this release while we were under the Lucene PMC, for
continuity purposes, only votes from Lucene PMC are binding, but everyone
(especially newly minted Nutch PMC members!) are  welcome to check the
release candidate and voice their approval or disapproval. The vote passes
if at least three binding +1 votes are cast.

[ ] +1 Release the packages as Apache Nutch 1.1.

[ ] -1 Do not release the packages because...

Thanks!

Cheers,
Chris

P.S. Here is my +1.

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.mattm...@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++





Re: [VOTE 2] Board resolution for Nutch as TLP

2010-04-12 Thread Mattmann, Chris A (388J)
+1, thanks for pushing this forward Andrzej!

Cheers,
Chris


On 4/12/10 4:32 AM, Doğacan Güney doga...@gmail.com wrote:

On Mon, Apr 12, 2010 at 14:08, Andrzej Bialecki a...@getopt.org wrote:
 Hi,

 Take two, after s/crawling/search/ ...

 Following the discussion, below is the text of the proposed Board
 Resolution to vote upon.

 [] +1.  Request the Board make Nutch a TLP
 [] +0.  I don't feel strongly about it, but I'm okay with this.
 [] -1.  No, don't request the Board make Nutch a TLP, and here are my
  reasons...

 This is a majority count vote (i.e. no vetoes). The vote is open for 72
 hours.

 Here's my +1.

And here is my +1.


 ===
 X. Establish the Apache Nutch Project

 WHEREAS, the Board of Directors deems it to be in the best
 interests of the Foundation and consistent with the
 Foundation's purpose to establish a Project Management
 Committee charged with the creation and maintenance of
 open-source software related to a large-scale web search
 platform for distribution at no charge to the public.

 NOW, THEREFORE, BE IT RESOLVED, that a Project Management
 Committee (PMC), to be known as the Apache Nutch Project,
 be and hereby is established pursuant to Bylaws of the
 Foundation; and be it further

 RESOLVED, that the Apache Nutch Project be and hereby is
 responsible for the creation and maintenance of software
 related to a large-scale web search platform; and be it further

 RESOLVED, that the office of Vice President, Apache Nutch be
 and hereby is created, the person holding such office to
 serve at the direction of the Board of Directors as the chair
 of the Apache Nutch Project, and to have primary responsibility
 for management of the projects within the scope of
 responsibility of the Apache Nutch Project; and be it further

 RESOLVED, that the persons listed immediately below be and
 hereby are appointed to serve as the initial members of the
 Apache Nutch Project:

• Andrzej Bialecki a...@...
• Otis Gospodnetic o...@...
• Dogacan Guney doga...@...
• Dennis Kubes ku...@...
• Chris Mattmann mattm...@...
• Julien Nioche jnio...@...
• Sami Siren si...@...

 RESOLVED, that the Apache Nutch Project be and hereby
 is tasked with the migration and rationalization of the Apache
 Lucene Nutch sub-project; and be it further

 RESOLVED, that all responsibilities pertaining to the Apache
 Lucene Nutch sub-project encumbered upon the
 Apache Lucene Project are hereafter discharged.

 NOW, THEREFORE, BE IT FURTHER RESOLVED, that Andrzej Bialecki
 be appointed to the office of Vice President, Apache Nutch, to
 serve in accordance with and subject to the direction of the
 Board of Directors and the Bylaws of the Foundation until
 death, resignation, retirement, removal or disqualification,
 or until a successor is appointed.
 ===


 --
 Best regards,
 Andrzej Bialecki 
  ___. ___ ___ ___ _ _   __
 [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
 ___|||__||  \|  ||  |  Embedded Unix, System Integration
 http://www.sigram.com  Contact: info at sigram dot com







--
Doğacan Güney



++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.mattm...@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: [DISCUSS] Board resolution for Nutch as TLP

2010-04-11 Thread Mattmann, Chris A (388J)
Hi Dogacan,

+1 to calling it a web search platform, since I agree, it’s not just a
crawler.

Cheers,
Chris


On 4/11/10 11:40 AM, Doğacan Güney doga...@gmail.com wrote:

 Hi,
 
 On Sat, Apr 10, 2010 at 16:32, Jukka Zitting jukka.zitt...@gmail.com wrote:
 Hi,
 
 On Fri, Apr 9, 2010 at 6:52 PM, Andrzej Bialecki a...@getopt.org wrote:
 WHEREAS, the Board of Directors deems it to be in the best
 interests of the Foundation and consistent with the
 Foundation's purpose to establish a Project Management
 Committee charged with the creation and maintenance of
 open-source software related to a large-scale web crawling
 platform for distribution at no charge to the public.
 
 Would it make sense to simplify the scope to ... open-source software
 related to large-scale web crawling for distribution at no charge to
 the public?
 
 
 Actually, shouldn't that be something like web search platform, or maybe a
 crawling and search platform? Nutch is not just a crawler.
 
 Anyway, +1 from me.
 
 BR,
 
 Jukka Zitting
 
 
 
 
 --
 Doğacan Güney
 


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.mattm...@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++




Re: Adding jpeg parser to nutch

2010-04-10 Thread Mattmann, Chris A (388J)
Hi David,

The latest Nutch release candidate (1.1, 
http://svn.apache.org/repos/asf/lucene/nutch/tags/1.1) includes the tika-parser 
plugin, which provides a JpegParser (see here: http://bit.ly/b0zRX8) that 
hopefully can suit your needs.

Let me know what you think.

Cheers,
Chris


On 4/10/10 6:56 AM, Gombkötő Dávid madav...@gmail.com wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hello.

Im working on a school task, wich is to modify nutch to be able to
identify, and download jpegs , creaty a thumbnail , and index the url of
this jpegs with the other crawl result so that the web interface can
show images as well.

 At the start i found that ParserNotFound.java can do the trick for me.
I modified the constructor so that it matches the url-s end to a
pattern, and if it ends to jpeg it creates a file with the name of the
md5sum of the url and writes the url in it to a directory found in my
filesystem. Well.. this is ugly, i wanted to add the working directory
to the parsernotfound.java , but i couldnt. And to move forward with my
work, i need to know how to make my own jpeg parser as first task. After
that i would like to index my result somehow :)

So.. my question.. how can i add my jpeg parser? Or, how can i add a new
parser to the nutch system? Thanks for your awnsers.

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJLwIObAAoJEIJu8h6i9aAHb6AH/jegl+oqvUg8nJCJo1p/IuVx
KuWthxGn0S+qDMfXrYb+AIRpmuj2YAWQwEE9Lhw2ftSJwFqH4gf4VwmDJq8CDTto
BDX+/lOOI7ZVtKzNmDgaN2nwX0gwn0PJgKTV8BGkUbVy3McfisQ/9v9UBzhjj7f7
DTvsZN2yNyv9PUls9GSqXw9czFsuKB7PLGnssqB6a8DTgFeoLT2F8e0B9q2Tht92
eAZV2awEnnH/wNTIjfwO00YXNdvNcGANiFzz0v4CoMekSEigoRBSemtYhsYCOppo
S0OUy8SCT4A2B6sWADIQjMKgnWuLm53dkHl9D91p0zMpnCTcq5u3hjLnxgq69L8=
=M7VY
-END PGP SIGNATURE-



++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.mattm...@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: [DISCUSS] Board resolution for Nutch as TLP

2010-04-09 Thread Mattmann, Chris A (388J)
Hi Andrzej,

+1, with the following amendment:

 
 RESOLVED, that all responsibilities pertaining to the Apache
 Lucene Nutch sub-project encumbered upon the
 Apache Nutch Project are hereafter discharged.

This should read:

 RESOLVED, that all responsibilities pertaining to the Apache
 Lucene Nutch sub-project encumbered upon the
 Apache Lucene Project are hereafter discharged.

Cheers,
Chris


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.mattm...@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++




Re: release of 1.1?

2010-04-06 Thread Mattmann, Chris A (388J)
Thanks Julien!

OK, I'll cut the RC at some point today. Thanks!

Cheers,
Chris


On 4/6/10 4:47 AM, Julien Nioche lists.digitalpeb...@gmail.com wrote:

Chris,

Just to let you know that I have committed 
https://issues.apache.org/jira/browse/NUTCH-810 which was the last open issue 
before the release of 1.1

Thanks

Julien


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.mattm...@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



[VOTE] Apache Nutch 1.1 Release Candidate #1

2010-04-06 Thread Mattmann, Chris A (388J)
Hi Folks,

I have posted a candidate for the Apache Nutch 1.1 release. The source code
is at:

http://people.apache.org/~mattmann/apache-nutch-1.1/rc1/

See the included CHANGES.txt file for details on release contents and latest
changes. The release was made using the Nutch release process, documented on
the Wiki here:

http://bit.ly/d5ugid

A Nutch 1.1 tag is at:

http://svn.apache.org/repos/asf/lucene/nutch/tags/1.1/

Please vote on releasing these packages as Apache Nutch 1.1. The vote is
open for the next 72 hours. Only votes from Lucene PMC are binding, but
everyone is welcome to check the release candidate and voice their approval
or disapproval. The vote passes if at least three binding +1 votes are cast.

[ ] +1 Release the packages as Apache Nutch 1.1.

[ ] -1 Do not release the packages because...

Thanks!

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.mattm...@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++




Re: Question: Nutch 0.8.2 and Nutch 0.7.3?

2010-04-04 Thread Mattmann, Chris A (388J)
Hey Andrzej,

 http://svn.apache.org/repos/asf/lucene/nutch/branches/branch-0.8/
 
 That's the code that was intended to become 0.8.2 ...
 
 However, I'm not sure whether there's any benefit in releasing either of
 these. Those who really had the need to track this branch (or 0.7)
 likely used the code from this branch even though it wasn't released.
 And I believe we are not interested in maintaining a new release based
 on this code...?

No problem, just wanted to guage interest. Is everyone OK with me closing
out those releases in JIRA, then?

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.mattm...@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++




Question: Nutch 0.8.2 and Nutch 0.7.3?

2010-04-03 Thread Mattmann, Chris A (388J)
Hey Guys,

Question. I see 2 releases that haven't been cut in JIRA:

0.8.2: 
https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=truepid=106
80fixfor=12312064

0.7.3:

https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=truepid=106
80fixfor=12312176

I'm happy to cut 0.8.2 as part of the 1.1 effort, to get it out the door.
However, I have a question: is this Nutch 0.8.2 in SVN?

http://svn.apache.org/repos/asf/lucene/nutch/branches/branch-0.8/

Nutch 0.7.3 has no issues associated with it, so should I remove it? It's
been a few years since it was created it seems and I don't think it's got
active maintenance, or a user base.

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.mattm...@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++




Re: [VOTE] Apache Tika 0.7 Release Candidate #1

2010-04-02 Thread Mattmann, Chris A (388J)
(apologies for the cross-post, but this impacts Nutch 1.1, so just wanted
folks to see it)

* +1 on extending the deadline until Monday, April 5th. Right now, we have 3
+1s, so technically we could still do the 72 hrs and still be OK, but I¹m
fine with giving folks some more time to take a look
* Thanks to jzitting and gsingers for taking a look and voting so far
* Once Tika 0.7 is out the door, I will move forward on pushing out a Nutch
1.1 RC (after we upgrade Nutch to use Tika 0.7 -- Julien, help? :) ). That
OK, Nutchers?
* Thanks for comments on the CHANGES from gsingers, and the mention to
include the sha1 of the src archive from jzitting. Will do on both, going
forward. 
* +1 for having a direct link to tika-app on the website.

Cheers,
Chris




On 4/1/10 11:41 PM, Jukka Zitting jukka.zitt...@gmail.com wrote:

 Hi,
 
 On Wed, Mar 31, 2010 at 10:01 PM, Mattmann, Chris A (388J)
 chris.a.mattm...@jpl.nasa.gov wrote:
 Please vote on releasing these packages as Apache Tika 0.7.
 
 +1 Thanks!
 
 Some minor notes:
 * It would be good to have also a SHA1 checksum for the release archive.
 * Perhaps we should start offering also the tika-app jar as a direct
 download from l.a.o/tika/download.html?
 
 The vote is open for the next 72 hours.
 
 It looks like people.apache.org is not accessible at the moment (I
 downloaded the release candidate yesterday), so it might be a good
 idea to extend the vote period over the Easter holidays.
 
 BR,
 
 Jukka Zitting
 


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.mattm...@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++




Re: [VOTE] Apache Tika 0.7 Release Candidate #1

2010-04-02 Thread Mattmann, Chris A (388J)
Hey Jukka,

Sounds good to me then if no one else objects.

I'll wait the 72 hrs (Sat, 4:01 PM EST) and then assuming the VOTE passes, roll 
the releases out to the mirrors and then work on Nutch 1.1.

Cheers,
Chris



On 4/2/10 11:41 AM, Jukka Zitting jukka.zitt...@gmail.com wrote:

Hi,

On Fri, Apr 2, 2010 at 4:14 PM, Mattmann, Chris A (388J)
chris.a.mattm...@jpl.nasa.gov wrote:
 +1s, so technically we could still do the 72 hrs and still be OK, but I'm
 fine with giving folks some more time to take a look

I'm fine with closing the vote already at 72 hours since the p.a.o
outage only seemed to last a few hours.

Jukka



++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.mattm...@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: 1.1 release?

2010-03-31 Thread Mattmann, Chris A (388J)
Hey Guys,

OK I'm finally getting around to this: I am going to push all the current 1.1 
JIRA issues out and set their fix version to nil. Once I'm done with this, 
I'll wait 48 hrs to see if there is anything that anyone really wants to get 
into 1.1. So, please, take a look here [1] and make sure that if you wanted 
your issue into 1.1, that it's there.

After 48 hours, I'll make one more announcement, and wait 24 hours before 
cutting the 1.1 RC and pushing to people.a.o for review. Here I go!

Cheers,
Chris



[1] http://bit.ly/cNehBc


On 3/9/10 10:54 AM, Andrzej Bialecki a...@getopt.org wrote:

On 2010-03-09 18:17, Julien Nioche wrote:
 Hi Chris,

 Excellent idea! There have been quite a few changes since 1.0 and it's
 probably the right time to have a new release.

+1. Let's just check JIRA and make sure we didn't forget anything
important ...


 Not really a blocker but https://issues.apache.org/jira/browse/NUTCH-762
 would be nice to have in 1.1, just needs a bit of reviewing / testing I
 suppose. Otherwise this can wait until after 1.1

I'll try to test it before the weekend.

--
Best regards,
Andrzej Bialecki 
  ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com




++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.mattm...@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: [DISCUSS] Nutch as a top level project (TLP)?

2010-03-20 Thread Mattmann, Chris A (388J)
Hey Andrzej,

I'd be +1 for Nutch being a TLP. I don't think it'll change much (other than to 
provide more visibility/etc., and to allow more focused decision making by the 
folks in the Nutch community). The infrastructure moves required to move to TLP 
status are moving mailing lists, moving JIRA, moving SVN, and moving the 
website (a bit of redesign/etc.), which shouldn't be that hard, and the infra 
team can probably help with (at least the first 3 parts if we file issues for 
them).

I'd volunteer to help with things like list moderation, or whatever else I can 
do to help.

The important things to decide would be:


 *   Who's on the PMC (my suggestion, similar to Tika, make existing Nutch 
committers PMC members)
 *   Who's the VP (my +1 for you)

Cheers,
Chris



On 3/19/10 12:51 PM, Andrzej Bialecki a...@getopt.org wrote:

Hi devs,

The ASF Board indicated recently that so called umbrella projects,
i.e. projects that host many significant sub-projects, should examine
their structure towards simplification, such as merging or splitting out
sub-projects.

Lucene TLP is such a project. Recently the Lucene PMC accepted the merge
of Solr and Lucene core projects. Mahout project will most likely split
to its own TLP soon. Which leaves Nutch as a sort of odd duck ;)

Moving Nutch to its own TLP has some advantages, mostly an easier
decision process - voting on new committers and new releases involves
then only those who participate directly in Nutch dev., i.e. the Nutch
community.

Also, from the coding point of view, Nutch is not intrinsically tied to
the Lucene development as if both would require some careful
coordination - we just use Lucene as one of many dependencies, and in
fact we aim to cleanly separate Nutch search API from Lucene-based API.
I can easily imagine Nutch dropping completely the low-level
Lucene-based components and moving to a more general search fabric (e.g.
SolrCloud).

Being its own TLP could also give Nutch more exposure and help to
crystallize our mission.

There are some disadvantages to such a split, too: we would need to
spend some more effort on various administrative tasks, and maintain a
separate web site (under Apache, but not under Lucene), and probably
some other tasks that I'm not yet aware of. This would also mean that
Nutch would have to stand on its own merit, which considering the small
number of active committers may be challenging.

Let's discuss this, and after we collect some pros and cons I'm going to
call for a vote.

--
Best regards,
Andrzej Bialecki 
  ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com




++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.mattm...@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



1.1 release?

2010-03-09 Thread Mattmann, Chris A (388J)
Hey Guys,

I have some extra time this weekend and early next week. Want me to be the
RM and push out a 1.1 release? Any blockers? I'm happy to do it just let me
know.

Cheers,
Chris


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.mattm...@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++




[ANNOUNCE] New Nutch Committer: Julien Nioche

2009-12-24 Thread Mattmann, Chris A (388J)
All,

A little while ago I nominated Julien Nioche to be Nutch committer based on
his contributions to the Nutch project (10+ patches in this release alone,
and all the mailing list help and thoughtful design discussion). I'm happy
to announce that the Lucene PMC has voted to make Julien a Nutch committer!

Julien, welcome to the team. The typical first committer task is to modify
the Nutch Forrest credits page and add yourself to the website. If you'd
like to say something about yourself and your background, feel free to do so
as well.

Welcome!

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.mattm...@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++