[ANNOUNCE] New Nutch Committer: Julien Nioche
All, A little while ago I nominated Julien Nioche to be Nutch committer based on his contributions to the Nutch project (10+ patches in this release alone, and all the mailing list help and thoughtful design discussion). I'm happy to announce that the Lucene PMC has voted to make Julien a Nutch committer! Julien, welcome to the team. The typical first committer task is to modify the Nutch Forrest credits page and add yourself to the website. If you'd like to say something about yourself and your background, feel free to do so as well. Welcome! Cheers, Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.mattm...@jpl.nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
1.1 release?
Hey Guys, I have some extra time this weekend and early next week. Want me to be the RM and push out a 1.1 release? Any blockers? I'm happy to do it just let me know. Cheers, Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.mattm...@jpl.nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
Re: [DISCUSS] Nutch as a top level project (TLP)?
Hey Andrzej, I'd be +1 for Nutch being a TLP. I don't think it'll change much (other than to provide more visibility/etc., and to allow more focused decision making by the folks in the Nutch community). The infrastructure moves required to move to TLP status are moving mailing lists, moving JIRA, moving SVN, and moving the website (a bit of redesign/etc.), which shouldn't be that hard, and the infra team can probably help with (at least the first 3 parts if we file issues for them). I'd volunteer to help with things like list moderation, or whatever else I can do to help. The important things to decide would be: * Who's on the PMC (my suggestion, similar to Tika, make existing Nutch committers PMC members) * Who's the VP (my +1 for you) Cheers, Chris On 3/19/10 12:51 PM, Andrzej Bialecki a...@getopt.org wrote: Hi devs, The ASF Board indicated recently that so called umbrella projects, i.e. projects that host many significant sub-projects, should examine their structure towards simplification, such as merging or splitting out sub-projects. Lucene TLP is such a project. Recently the Lucene PMC accepted the merge of Solr and Lucene core projects. Mahout project will most likely split to its own TLP soon. Which leaves Nutch as a sort of odd duck ;) Moving Nutch to its own TLP has some advantages, mostly an easier decision process - voting on new committers and new releases involves then only those who participate directly in Nutch dev., i.e. the Nutch community. Also, from the coding point of view, Nutch is not intrinsically tied to the Lucene development as if both would require some careful coordination - we just use Lucene as one of many dependencies, and in fact we aim to cleanly separate Nutch search API from Lucene-based API. I can easily imagine Nutch dropping completely the low-level Lucene-based components and moving to a more general search fabric (e.g. SolrCloud). Being its own TLP could also give Nutch more exposure and help to crystallize our mission. There are some disadvantages to such a split, too: we would need to spend some more effort on various administrative tasks, and maintain a separate web site (under Apache, but not under Lucene), and probably some other tasks that I'm not yet aware of. This would also mean that Nutch would have to stand on its own merit, which considering the small number of active committers may be challenging. Let's discuss this, and after we collect some pros and cons I'm going to call for a vote. -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.mattm...@jpl.nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
Re: 1.1 release?
Hey Guys, OK I'm finally getting around to this: I am going to push all the current 1.1 JIRA issues out and set their fix version to nil. Once I'm done with this, I'll wait 48 hrs to see if there is anything that anyone really wants to get into 1.1. So, please, take a look here [1] and make sure that if you wanted your issue into 1.1, that it's there. After 48 hours, I'll make one more announcement, and wait 24 hours before cutting the 1.1 RC and pushing to people.a.o for review. Here I go! Cheers, Chris [1] http://bit.ly/cNehBc On 3/9/10 10:54 AM, Andrzej Bialecki a...@getopt.org wrote: On 2010-03-09 18:17, Julien Nioche wrote: Hi Chris, Excellent idea! There have been quite a few changes since 1.0 and it's probably the right time to have a new release. +1. Let's just check JIRA and make sure we didn't forget anything important ... Not really a blocker but https://issues.apache.org/jira/browse/NUTCH-762 would be nice to have in 1.1, just needs a bit of reviewing / testing I suppose. Otherwise this can wait until after 1.1 I'll try to test it before the weekend. -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.mattm...@jpl.nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
Re: [VOTE] Apache Tika 0.7 Release Candidate #1
(apologies for the cross-post, but this impacts Nutch 1.1, so just wanted folks to see it) * +1 on extending the deadline until Monday, April 5th. Right now, we have 3 +1s, so technically we could still do the 72 hrs and still be OK, but I¹m fine with giving folks some more time to take a look * Thanks to jzitting and gsingers for taking a look and voting so far * Once Tika 0.7 is out the door, I will move forward on pushing out a Nutch 1.1 RC (after we upgrade Nutch to use Tika 0.7 -- Julien, help? :) ). That OK, Nutchers? * Thanks for comments on the CHANGES from gsingers, and the mention to include the sha1 of the src archive from jzitting. Will do on both, going forward. * +1 for having a direct link to tika-app on the website. Cheers, Chris On 4/1/10 11:41 PM, Jukka Zitting jukka.zitt...@gmail.com wrote: Hi, On Wed, Mar 31, 2010 at 10:01 PM, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: Please vote on releasing these packages as Apache Tika 0.7. +1 Thanks! Some minor notes: * It would be good to have also a SHA1 checksum for the release archive. * Perhaps we should start offering also the tika-app jar as a direct download from l.a.o/tika/download.html? The vote is open for the next 72 hours. It looks like people.apache.org is not accessible at the moment (I downloaded the release candidate yesterday), so it might be a good idea to extend the vote period over the Easter holidays. BR, Jukka Zitting ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.mattm...@jpl.nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
Re: [VOTE] Apache Tika 0.7 Release Candidate #1
Hey Jukka, Sounds good to me then if no one else objects. I'll wait the 72 hrs (Sat, 4:01 PM EST) and then assuming the VOTE passes, roll the releases out to the mirrors and then work on Nutch 1.1. Cheers, Chris On 4/2/10 11:41 AM, Jukka Zitting jukka.zitt...@gmail.com wrote: Hi, On Fri, Apr 2, 2010 at 4:14 PM, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: +1s, so technically we could still do the 72 hrs and still be OK, but I'm fine with giving folks some more time to take a look I'm fine with closing the vote already at 72 hours since the p.a.o outage only seemed to last a few hours. Jukka ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.mattm...@jpl.nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
Question: Nutch 0.8.2 and Nutch 0.7.3?
Hey Guys, Question. I see 2 releases that haven't been cut in JIRA: 0.8.2: https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=truepid=106 80fixfor=12312064 0.7.3: https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=truepid=106 80fixfor=12312176 I'm happy to cut 0.8.2 as part of the 1.1 effort, to get it out the door. However, I have a question: is this Nutch 0.8.2 in SVN? http://svn.apache.org/repos/asf/lucene/nutch/branches/branch-0.8/ Nutch 0.7.3 has no issues associated with it, so should I remove it? It's been a few years since it was created it seems and I don't think it's got active maintenance, or a user base. Cheers, Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.mattm...@jpl.nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
Re: Question: Nutch 0.8.2 and Nutch 0.7.3?
Hey Andrzej, http://svn.apache.org/repos/asf/lucene/nutch/branches/branch-0.8/ That's the code that was intended to become 0.8.2 ... However, I'm not sure whether there's any benefit in releasing either of these. Those who really had the need to track this branch (or 0.7) likely used the code from this branch even though it wasn't released. And I believe we are not interested in maintaining a new release based on this code...? No problem, just wanted to guage interest. Is everyone OK with me closing out those releases in JIRA, then? Cheers, Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.mattm...@jpl.nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
Re: release of 1.1?
Thanks Julien! OK, I'll cut the RC at some point today. Thanks! Cheers, Chris On 4/6/10 4:47 AM, Julien Nioche lists.digitalpeb...@gmail.com wrote: Chris, Just to let you know that I have committed https://issues.apache.org/jira/browse/NUTCH-810 which was the last open issue before the release of 1.1 Thanks Julien ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.mattm...@jpl.nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
[VOTE] Apache Nutch 1.1 Release Candidate #1
Hi Folks, I have posted a candidate for the Apache Nutch 1.1 release. The source code is at: http://people.apache.org/~mattmann/apache-nutch-1.1/rc1/ See the included CHANGES.txt file for details on release contents and latest changes. The release was made using the Nutch release process, documented on the Wiki here: http://bit.ly/d5ugid A Nutch 1.1 tag is at: http://svn.apache.org/repos/asf/lucene/nutch/tags/1.1/ Please vote on releasing these packages as Apache Nutch 1.1. The vote is open for the next 72 hours. Only votes from Lucene PMC are binding, but everyone is welcome to check the release candidate and voice their approval or disapproval. The vote passes if at least three binding +1 votes are cast. [ ] +1 Release the packages as Apache Nutch 1.1. [ ] -1 Do not release the packages because... Thanks! Cheers, Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.mattm...@jpl.nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
Re: [DISCUSS] Board resolution for Nutch as TLP
Hi Andrzej, +1, with the following amendment: RESOLVED, that all responsibilities pertaining to the Apache Lucene Nutch sub-project encumbered upon the Apache Nutch Project are hereafter discharged. This should read: RESOLVED, that all responsibilities pertaining to the Apache Lucene Nutch sub-project encumbered upon the Apache Lucene Project are hereafter discharged. Cheers, Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.mattm...@jpl.nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
Re: Adding jpeg parser to nutch
Hi David, The latest Nutch release candidate (1.1, http://svn.apache.org/repos/asf/lucene/nutch/tags/1.1) includes the tika-parser plugin, which provides a JpegParser (see here: http://bit.ly/b0zRX8) that hopefully can suit your needs. Let me know what you think. Cheers, Chris On 4/10/10 6:56 AM, Gombkötő Dávid madav...@gmail.com wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello. Im working on a school task, wich is to modify nutch to be able to identify, and download jpegs , creaty a thumbnail , and index the url of this jpegs with the other crawl result so that the web interface can show images as well. At the start i found that ParserNotFound.java can do the trick for me. I modified the constructor so that it matches the url-s end to a pattern, and if it ends to jpeg it creates a file with the name of the md5sum of the url and writes the url in it to a directory found in my filesystem. Well.. this is ugly, i wanted to add the working directory to the parsernotfound.java , but i couldnt. And to move forward with my work, i need to know how to make my own jpeg parser as first task. After that i would like to index my result somehow :) So.. my question.. how can i add my jpeg parser? Or, how can i add a new parser to the nutch system? Thanks for your awnsers. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJLwIObAAoJEIJu8h6i9aAHb6AH/jegl+oqvUg8nJCJo1p/IuVx KuWthxGn0S+qDMfXrYb+AIRpmuj2YAWQwEE9Lhw2ftSJwFqH4gf4VwmDJq8CDTto BDX+/lOOI7ZVtKzNmDgaN2nwX0gwn0PJgKTV8BGkUbVy3McfisQ/9v9UBzhjj7f7 DTvsZN2yNyv9PUls9GSqXw9czFsuKB7PLGnssqB6a8DTgFeoLT2F8e0B9q2Tht92 eAZV2awEnnH/wNTIjfwO00YXNdvNcGANiFzz0v4CoMekSEigoRBSemtYhsYCOppo S0OUy8SCT4A2B6sWADIQjMKgnWuLm53dkHl9D91p0zMpnCTcq5u3hjLnxgq69L8= =M7VY -END PGP SIGNATURE- ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.mattm...@jpl.nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
Re: [DISCUSS] Board resolution for Nutch as TLP
Hi Dogacan, +1 to calling it a web search platform, since I agree, it’s not just a crawler. Cheers, Chris On 4/11/10 11:40 AM, Doğacan Güney doga...@gmail.com wrote: Hi, On Sat, Apr 10, 2010 at 16:32, Jukka Zitting jukka.zitt...@gmail.com wrote: Hi, On Fri, Apr 9, 2010 at 6:52 PM, Andrzej Bialecki a...@getopt.org wrote: WHEREAS, the Board of Directors deems it to be in the best interests of the Foundation and consistent with the Foundation's purpose to establish a Project Management Committee charged with the creation and maintenance of open-source software related to a large-scale web crawling platform for distribution at no charge to the public. Would it make sense to simplify the scope to ... open-source software related to large-scale web crawling for distribution at no charge to the public? Actually, shouldn't that be something like web search platform, or maybe a crawling and search platform? Nutch is not just a crawler. Anyway, +1 from me. BR, Jukka Zitting -- Doğacan Güney ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.mattm...@jpl.nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
Re: [VOTE 2] Board resolution for Nutch as TLP
+1, thanks for pushing this forward Andrzej! Cheers, Chris On 4/12/10 4:32 AM, Doğacan Güney doga...@gmail.com wrote: On Mon, Apr 12, 2010 at 14:08, Andrzej Bialecki a...@getopt.org wrote: Hi, Take two, after s/crawling/search/ ... Following the discussion, below is the text of the proposed Board Resolution to vote upon. [] +1. Request the Board make Nutch a TLP [] +0. I don't feel strongly about it, but I'm okay with this. [] -1. No, don't request the Board make Nutch a TLP, and here are my reasons... This is a majority count vote (i.e. no vetoes). The vote is open for 72 hours. Here's my +1. And here is my +1. === X. Establish the Apache Nutch Project WHEREAS, the Board of Directors deems it to be in the best interests of the Foundation and consistent with the Foundation's purpose to establish a Project Management Committee charged with the creation and maintenance of open-source software related to a large-scale web search platform for distribution at no charge to the public. NOW, THEREFORE, BE IT RESOLVED, that a Project Management Committee (PMC), to be known as the Apache Nutch Project, be and hereby is established pursuant to Bylaws of the Foundation; and be it further RESOLVED, that the Apache Nutch Project be and hereby is responsible for the creation and maintenance of software related to a large-scale web search platform; and be it further RESOLVED, that the office of Vice President, Apache Nutch be and hereby is created, the person holding such office to serve at the direction of the Board of Directors as the chair of the Apache Nutch Project, and to have primary responsibility for management of the projects within the scope of responsibility of the Apache Nutch Project; and be it further RESOLVED, that the persons listed immediately below be and hereby are appointed to serve as the initial members of the Apache Nutch Project: • Andrzej Bialecki a...@... • Otis Gospodnetic o...@... • Dogacan Guney doga...@... • Dennis Kubes ku...@... • Chris Mattmann mattm...@... • Julien Nioche jnio...@... • Sami Siren si...@... RESOLVED, that the Apache Nutch Project be and hereby is tasked with the migration and rationalization of the Apache Lucene Nutch sub-project; and be it further RESOLVED, that all responsibilities pertaining to the Apache Lucene Nutch sub-project encumbered upon the Apache Lucene Project are hereafter discharged. NOW, THEREFORE, BE IT FURTHER RESOLVED, that Andrzej Bialecki be appointed to the office of Vice President, Apache Nutch, to serve in accordance with and subject to the direction of the Board of Directors and the Bylaws of the Foundation until death, resignation, retirement, removal or disqualification, or until a successor is appointed. === -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com -- Doğacan Güney ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.mattm...@jpl.nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
[VOTE] Apache Nutch 1.1 Release Candidate #2
Hi Folks, I have posted an updated candidate for the Apache Nutch 1.1 release. The source code is at: http://people.apache.org/~mattmann/apache-nutch-1.1/rc2/ The major difference between this release and rc #1 is the application of NUTCH-812 - Crawl.java incorrectly uses the Generator API resulting in NPE - as well as some commits by Sami Siren to fix missing ASL license headers. For more detailed information, see the included CHANGES.txt file for details on release contents and latest changes. The release was made using the Nutch release process, documented on the Wiki here: http://bit.ly/d5ugid A Nutch 1.1 tag is at: http://svn.apache.org/repos/asf/lucene/nutch/tags/1.1/ note There was a request by Sami Siren that the tutorial be updated to reflect the fact that this release is a source-only release, as well as a request to integrate RAT into the build, however, in the interest of getting this 1.1 out and getting going on the Nutch TLP, my proposal is: * update the docs independent of this release (the tutorial as it exists right now says 0.7 on it anyways and doesn't look like it's been updated in a while, so I think users can live with what's there and support on u...@nutch.apache.org or d...@nutch.apache.org until it's updated) * begin source only releases in general since we've long had the debate as to the size of the Nutch release. Most folks that use Nutch are likely familiar with running ant IMHO. * run RAT and integrate into the build /note Please vote on releasing these packages as Apache Nutch 1.1. The vote is open for the next 72 hours. Since Nutch is now a TLP and has its own PMC, there is a question of who are the binding release VOTES in this particular thread. My gut reaction is that since I started this release while we were under the Lucene PMC, for continuity purposes, only votes from Lucene PMC are binding, but everyone (especially newly minted Nutch PMC members!) are welcome to check the release candidate and voice their approval or disapproval. The vote passes if at least three binding +1 votes are cast. [ ] +1 Release the packages as Apache Nutch 1.1. [ ] -1 Do not release the packages because... Thanks! Cheers, Chris P.S. Here is my +1. ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.mattm...@jpl.nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
Re: Running ANT; was -- Re: [VOTE] Apache Nutch 1.1 Release Candidate #2
Hi David, Thanks. In fact, running ant is probably simpler than running Nutch. The steps would be: * what OS are you on (Ant is available for all of them to my knowledge)? * if you need ant, grab a distro from ant.apache.org, otherwise, I'll assume that you've got ant installed and callable from the command line. * unpack the nutch src distribution, cd into that directory, type ant job, and there you go. HTH! You could try it out by taking the Nutch src code from SVN at: http://svn.apache.org/repos/asf/lucene/nutch/tags/1.1, and then trying the steps above. Cheers, Chris On 4/26/10 7:24 AM, David M. Cole d...@colegroup.com wrote: At 10:55 PM -0700 4/25/10, Mattmann, Chris A (388J) wrote: Most folks that use Nutch are likely familiar with running ant IMHO. I guess then I fall into the category of not most folks. Have been running Nutch for about 14 months and I haven't a clue how to run ant. If there's a place to vote to suggest that compiled versions still be distributed, I vote for that. Thanks. \dmc -- *+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+ David M. Coled...@colegroup.com Editor Publisher, NewsInc. http://newsinc.netV: (650) 557-2993 Consultant: The Cole Group http://colegroup.com/ F: (650) 475-8479 *+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+ ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.mattm...@jpl.nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
Re: [VOTE] Apache Nutch 1.1 Release Candidate #2
Hi Grant, Thanks. I think it actually makes sense to finish off 1.1, and since there is overlap with the Nutch PMC and the Lucene PMC and since the thread started in Lucene before the TLP, I think it would be great e.g., if Andrzej, and Sami could check the release and that way we still have the continuity and can safely push it out as the last Nutch rel under the Lucene umbrella... Then all releases post 1.1 can cleanly be done under the auspices of the new PMC :) Cheers, Chris On 4/26/10 5:34 AM, Grant Ignersoll gsing...@apache.org wrote: Might I suggest, that since Nutch is now a TLP that you delay this release by a few weeks and have the vote done under the auspices of the Nutch PMC? Cheers, Grant On Apr 26, 2010, at 1:55 AM, Mattmann, Chris A (388J) wrote: Hi Folks, I have posted an updated candidate for the Apache Nutch 1.1 release. The source code is at: http://people.apache.org/~mattmann/apache-nutch-1.1/rc2/ The major difference between this release and rc #1 is the application of NUTCH-812 - Crawl.java incorrectly uses the Generator API resulting in NPE - as well as some commits by Sami Siren to fix missing ASL license headers. For more detailed information, see the included CHANGES.txt file for details on release contents and latest changes. The release was made using the Nutch release process, documented on the Wiki here: http://bit.ly/d5ugid A Nutch 1.1 tag is at: http://svn.apache.org/repos/asf/lucene/nutch/tags/1.1/ note There was a request by Sami Siren that the tutorial be updated to reflect the fact that this release is a source-only release, as well as a request to integrate RAT into the build, however, in the interest of getting this 1.1 out and getting going on the Nutch TLP, my proposal is: * update the docs independent of this release (the tutorial as it exists right now says 0.7 on it anyways and doesn't look like it's been updated in a while, so I think users can live with what's there and support on u...@nutch.apache.org or d...@nutch.apache.org until it's updated) * begin source only releases in general since we've long had the debate as to the size of the Nutch release. Most folks that use Nutch are likely familiar with running ant IMHO. * run RAT and integrate into the build /note Please vote on releasing these packages as Apache Nutch 1.1. The vote is open for the next 72 hours. Since Nutch is now a TLP and has its own PMC, there is a question of who are the binding release VOTES in this particular thread. My gut reaction is that since I started this release while we were under the Lucene PMC, for continuity purposes, only votes from Lucene PMC are binding, but everyone (especially newly minted Nutch PMC members!) are welcome to check the release candidate and voice their approval or disapproval. The vote passes if at least three binding +1 votes are cast. [ ] +1 Release the packages as Apache Nutch 1.1. [ ] -1 Do not release the packages because... Thanks! Cheers, Chris P.S. Here is my +1. ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.mattm...@jpl.nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.mattm...@jpl.nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
Re: [VOTE] Apache Nutch 1.1 Release Candidate #2
Hey Andrzej, Okey dokey, np! Let's get the patch in first :) I can cut as many RCs as needed. Cheers, Chris On 4/26/10 11:30 AM, Andrzej Bialecki a...@getopt.org wrote: On 2010-04-26 17:19, Mattmann, Chris A (388J) wrote: Hi Grant, Thanks. I think it actually makes sense to finish off 1.1, and since there is overlap with the Nutch PMC and the Lucene PMC and since the thread started in Lucene before the TLP, I think it would be great e.g., if Andrzej, and Sami could check the release and that way we still have the continuity and can safely push it out as the last Nutch rel under the Lucene umbrella... Then all releases post 1.1 can cleanly be done under the auspices of the new PMC :) I know that Dennis Kubes just discovered a bug in SegmentMerger (he may report on it in a moment) - this bug has been there for a while, it's likely the cause of the mysterious out of disk space errors, and it manifests itself only with input files larger than HDFS block size (64MB). Since 1.1 is likely the final release of Nutch 1.x I think it would make sense to fix this bug before we release ... -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.mattm...@jpl.nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
[VOTE] Apache Nutch 1.1 Release Candidate #3
Hi Folks, I have posted an updated candidate for the Apache Nutch 1.1 release. The source code is at: http://people.apache.org/~mattmann/apache-nutch-1.1/rc3/ The major differences between this release and rc #2 are the application of: NUTCH-816, NUTCH-732, NUTCH-815, NUTCH-814, and NUTCH-812 based on feedback from prior release candidates. For more detailed information, see the included CHANGES.txt file for details on release contents and latest changes. The release was made using the Nutch release process, documented on the Wiki here: http://bit.ly/d5ugid A Nutch 1.1 tag is at: http://svn.apache.org/repos/asf/lucene/nutch/tags/1.1/ note In response to several user requests during the last RC cycle, I've also included *binary* releases (labeled as apache-nutch-1.1-bin.tar.gz and apache-nutch-1.1-bin.zip). This addresses Sami Siren's request that the tutorial be updated to reflect the fact that this release is a source-only release. Sami also requested to integrate RAT into the build, however, in the interest of getting this 1.1 out and getting going on the Nutch TLP, my proposal is: * run RAT and integrate into the build on releases post 1.1 /note Please vote on releasing these packages as Apache Nutch 1.1. The vote is open for the next 72 hours. Only votes from Nutch PMC are binding, but folks are welcome to check the release candidate and voice their approval or disapproval. The vote passes if at least three binding +1 votes are cast. [ ] +1 Release the packages as Apache Nutch 1.1. [ ] -1 Do not release the packages because... Thanks! Cheers, Chris P.S. Here is my +1. ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.mattm...@jpl.nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++