[jira] [Commented] (NUTCH-1927) Create a whitelist of IPs/hostnames to allow skipping of RobotRules parsing

2015-04-17 Thread Mattmann, Chris A (388J) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14500604#comment-14500604 ] Mattmann, Chris A (388J) commented on NUTCH-1927: - +1 please commit

[jira] [Commented] (NUTCH-1832) Make Nutch work without an indexer

2014-09-04 Thread Mattmann, Chris A (388J) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14121477#comment-14121477 ] Mattmann, Chris A (388J) commented on NUTCH-1832: - Will reply in more

Re: Important : Bunch of Spam Created under Nutch Wiki!!

2013-04-01 Thread Mattmann, Chris A (388J)
at 4:29 PM, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: Hey Kiran, I think here: http://wiki.apache.org/general/OurWikiFarm#per_wiki_access_control_-_tight e n_your_wiki_just_a_little.2C_benefit_just_a_lot Cheers, Chris

Re: Important : Bunch of Spam Created under Nutch Wiki!!

2013-04-01 Thread Mattmann, Chris A (388J)
Thanks Kiran! ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/

Re: Nutch Wiki

2013-03-30 Thread Mattmann, Chris A (388J)
Seconded! ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/

Re: Important : Bunch of Spam Created under Nutch Wiki!!

2013-03-28 Thread Mattmann, Chris A (388J)
Hi Kiran, Yes, my recommendation: 1. Jump into #asfinfra on freeonode, find Joe, or Gavin or Daniel, ask for help. If you don't have IRC, email infrastruct...@apache.org and/or file a https://issues.apache.org/jira/browse/INFRA ticket 2. Request that they enable ASAP ContributorsGroup only

Re: [Nutch Wiki] Trivial Update of PGOSimone by PGOSimone

2013-03-25 Thread Mattmann, Chris A (388J)
Hey Julien, I heard on #asfinfra that any of our MoinMoin wikis have been attacked recently by SPAM. I think we may want to contact infra and ask for specific ContributorsGroup only Nutch wiki access. http://wiki.apache.org/general/OurWikiFarm Cheers, Chris From: Julien Nioche

Re: GSOC 2013 project: Apache-Wicket based Nutch webapp

2013-03-24 Thread Mattmann, Chris A (388J)
a decent UI running with functionalities. Regards, Kiran. On Sat, Mar 23, 2013 at 2:33 PM, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: That is so awesome Kiran. Great job and I would love a link to your thesis (or even seeing the work in progress) if you are willing to share

Re: Google Summer of Code 2013 - Giraph implementation of Nutch LinkRank Algorithm

2013-03-24 Thread Mattmann, Chris A (388J)
Super +1 -- sounds awesome Lewis. Cheers, Chris On 3/24/13 12:38 PM, Lewis John Mcgibbney lewis.mcgibb...@gmail.com wrote: Hi All, After some discussion and drumming up of interest within the Giraph community, I've logged a Google Summer of Code issue [0] for this topic. We are looking for

Re: GSOC 2013 project: Apache-Wicket based Nutch webapp

2013-03-24 Thread Mattmann, Chris A (388J)
with functionalities. Regards, Kiran. On Sat, Mar 23, 2013 at 2:33 PM, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.govmailto:chris.a.mattm...@jpl.nasa.gov wrote: That is so awesome Kiran. Great job and I would love a link to your thesis (or even seeing the work in progress) if you are willing

Re: GSOC 2013 project: Apache-Wicket based Nutch webapp

2013-03-23 Thread Mattmann, Chris A (388J)
possible. Thank you, Kiran. On Sat, Mar 23, 2013 at 11:23 AM, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.govmailto:chris.a.mattm...@jpl.nasa.gov wrote: Hi Kiran, Great, yes the REST services need work for sure. They haven't been worked on in a while. I'm privy to Apache CXF, but I

GSOC 2013 project: Apache-Wicket based Nutch webapp

2013-03-22 Thread Mattmann, Chris A (388J)
Hey Guys, I posted: https://issues.apache.org/jira/browse/NUTCH-841 As a potential GSOC 2013 summer project. I'm willing to mentor it, since I love Wicket, and I'm willing to maintain the result as a Nutch committer. If NUTCH-841 doesn't get selected, I'll start implementing it this summer if

FW: GSoC 2013

2013-03-18 Thread Mattmann, Chris A (388J)
[Apologies for cross post] Guys, to play in the GSoC 2013 spec, we just need to tag issues in JIRA with the gsoc2013 tag. I'll try and come up with few projects soon :) Cheers, Chris On 3/15/13 11:15 AM, Luciano Resende luckbr1...@gmail.com wrote: On Fri, Mar 15, 2013 at 11:01 AM, Manish

Re: [ANNOUNCEMENT] Welcome Kiran Chitturi as Apache Nutch PMC and Committer

2013-03-10 Thread Mattmann, Chris A (388J)
This is great to hear Kiran, welcome to the team! Cheers, Chris From: Julien Nioche lists.digitalpeb...@gmail.commailto:lists.digitalpeb...@gmail.com Reply-To: dev@nutch.apache.orgmailto:dev@nutch.apache.org dev@nutch.apache.orgmailto:dev@nutch.apache.org Date: Sunday, March 10, 2013 2:15 PM

FW: [OPENING] Google Summer of Code Applications

2013-03-10 Thread Mattmann, Chris A (388J)
FYI On 3/10/13 5:10 PM, Lewis John Mcgibbney lewis.mcgibb...@gmail.com wrote: I just told a huge lie. I got my dates mixed up... Students have from between April 22nd and May 3rd to get proposals in. Sorry about the mix up. Lewis On Sun, Mar 10, 2013 at 5:09 PM, Lewis John Mcgibbney

Re: Review board giving issue

2013-03-07 Thread Mattmann, Chris A (388J)
Hi Tejas, Yeah I was having some issue at the time, but will try and see if it is working tomorrow. If it's still not working we can contact infra@ Cheers, Chris From: Tejas Patil tejas.patil...@gmail.commailto:tejas.patil...@gmail.com Reply-To: dev@nutch.apache.orgmailto:dev@nutch.apache.org

Re: [DISCUSS] Google Summer of Code

2013-03-04 Thread Mattmann, Chris A (388J)
Hey Lewis, Great job starting this thread. +1 Giraph is welcome here. Multi-project GSoCs always do well. One thing I had in mind was taking an implementation of Hubs and Authorities developed for Nutch 1.3 a few years back in my USC class and then having someone integrate it into the current

Re: [DISCUSS] Google Summer of Code

2013-03-04 Thread Mattmann, Chris A (388J)
Hey Markus, Yep my student implement HITS (on the fly) ranking, and classification (I think). It's sitting on my HD for 2 years :( So if someone can pick it up it would be a nice GSoC project. Glad to hear there is interest. Cheers, Chris On 3/4/13 1:21 PM, Markus Jelsma

Re: [DISCUSS] Google Summer of Code

2013-03-04 Thread Mattmann, Chris A (388J)
Hey Markus: https://issues.apache.org/jira/browse/NUTCH-1539 Will submit the code soon. Cheers, Chris On 3/4/13 1:43 PM, Markus Jelsma markus.jel...@openindex.io wrote: Ah yes! Please open an issue and if you can attach anything that matters such as a description of the algorithm, how it

Re: Nutch JAVA Application

2013-02-12 Thread Mattmann, Chris A (388J)
Hi Shann, Thank you for reaching out! If your goal is to get your project integrated into Apache Nutch, proper, then I would recommend simply: 0. File some JIRA issues in Apache Nutch http://issues.apache.org/jira/browse/NUTCH Small incremental patches and issues are preferred and this will let

FW: [GSoC Mentors] Google Summer of Code 2013

2013-02-11 Thread Mattmann, Chris A (388J)
[Sorry for cross posting] Guys, FYI please note that you can participate as a mentor from a PMC via Apache as they are a GSoC org. ComDev will coordinate our participation but start thinking about what projects we may want to do. Cheers, Chris From: Carol Smith

Re: [DISCUSS] Nutch Policy/Opinion on Review Board

2013-01-31 Thread Mattmann, Chris A (388J)
I love it and will use it but don't think it needs to be a policy to each their own :) Thanks buddy Sent from my iPhone On Jan 31, 2013, at 3:58 PM, Lewis John Mcgibbney lewis.mcgibb...@gmail.com wrote: Hi All, I thought I would create this thread as the Review Board platform has been

Re: review board

2013-01-26 Thread Mattmann, Chris A (388J)
Hey Tejas, Yeah I think this has to do with something in the repo URL on the RB server side. I would file an INFRA ticket, or jump on #asfinfra on IRC and ask one of the guys for help there. Cheers, Chris From: Tejas Patil tejas.patil...@gmail.commailto:tejas.patil...@gmail.com Reply-To:

Re: 1.8 in Jira

2012-12-21 Thread Mattmann, Chris A (388J)
woot yep ;) On 12/21/12 2:55 AM, Markus Jelsma markus.jel...@openindex.io wrote: forget it, i meant 1.7 but it's there already! -Original message- From:Markus Jelsma markus.jel...@openindex.io Sent: Fri 21-Dec-2012 11:54 To: dev@nutch.apache.org dev@nutch.apache.org Subject: 1.8 in

Re: [VOTE] Apache Nutch 1.6 Release Candidate

2012-11-29 Thread Mattmann, Chris A (388J)
Thanks guys. I should review this today. Cheers, Chris On Nov 29, 2012, at 5:31 AM, Lewis John Mcgibbney wrote: Hi, On Wed, Nov 28, 2012 at 10:11 AM, Julien Nioche lists.digitalpeb...@gmail.com wrote: - CHANGES.txt contains dates in both MM/DD/ and DD/MM/ formats. Shall we

Re: Strategy for Assigning Issues by Version

2012-11-29 Thread Mattmann, Chris A (388J)
Hey Lewis, On Nov 29, 2012, at 5:54 AM, Lewis John Mcgibbney wrote: Hi All, Right now I found myself facing a bit of a dilemma w.r.t bumping on the issues for the next Nutch release. Currently due to legacy workflows, we have some 120 issues assigned for 1.6... however ALL issues have

Re: Strategy for Assigning Issues by Version

2012-11-29 Thread Mattmann, Chris A (388J)
+50 :) On Nov 29, 2012, at 8:32 AM, Lewis John Mcgibbney wrote: So in summary, We retain the legacy behavior and bump them ALL to 1.7 In the 1.7 development drive (if and when we can) we make an effort to act on patched issues in an attempt to pick the low hanging fruit so to speak...

Re: [DISCUSS] trunk release?

2012-11-22 Thread Mattmann, Chris A (388J)
Release early, release often :) I'd say I'd be happy to try and spin it, but you'd beat me to it so I just will say I'll be happy to test the RC and voice my VOTE when you roll it Lewis :) Happy Thanksgiving (even though you're not in the States yet)! Cheers, Chris On Nov 22, 2012, at 7:15

Re: [ANNOUNCE] Apache Nutch 2.1 Released

2012-10-05 Thread Mattmann, Chris A (388J)
Great job everyone! Cheers, Chris On Oct 5, 2012, at 9:29 AM, Julien Nioche wrote: Thanks Lewis and well done everyone! Enjoy your week end Julien On 5 October 2012 16:12, lewis john mcgibbney lewi...@apache.org wrote: Good Afternoon Everyone, The Apache Nutch PMC are very pleased

Re: [PING] [VOTE] Apache Nutch 2.1 Release Candidate Available

2012-10-04 Thread Mattmann, Chris A (388J)
Thanks for your VOTE! Cheers, Chris On Oct 4, 2012, at 1:08 AM, j.sulli...@thomsonreuters.com j.sulli...@thomsonreuters.com wrote: A bit late but my two cents. I have done a couple of installs on Ubuntu 12.04 using MySQL for the backend and have noticed a couple of the improvements and no

Re: Status of 2.1 release

2012-09-21 Thread Mattmann, Chris A (388J)
Take care dude! I'll give trunk a shot... Cheers, Chris On Sep 21, 2012, at 7:34 AM, Lewis John Mcgibbney wrote: Hi All, Basically thank god it was brought to our attention that giora-cassandra 0.2.1 is buggy and needs some work before it is ready to be integrated into a stable Nutch 2.x

Re: svn commit: r1387363 - in /nutch/branches/2.1: CHANGES.txt build.xml pom.xml

2012-09-18 Thread Mattmann, Chris A (388J)
Lewis you beat me to it, you ROCK! Cheers, Chris On Sep 18, 2012, at 5:11 PM, lewi...@apache.org lewi...@apache.org wrote: Author: lewismc Date: Tue Sep 18 21:11:06 2012 New Revision: 1387363 URL: http://svn.apache.org/viewvc?rev=1387363view=rev Log: forward port of NUTCH-1415

Re: Nutch 2.1 Release???

2012-09-15 Thread Mattmann, Chris A (388J)
+1 I'd be happy to help! Cheers, Chris On Sep 15, 2012, at 9:24 AM, Lewis John Mcgibbney wrote: Hi Everyone, Without me slevering on, this suggestion speaks for itself. We have resolved 32 issues, including pulling in upgrades on the Gora dependency. It would be nice to push these

Re: Nutch 2.1 Release???

2012-09-15 Thread Mattmann, Chris A (388J)
. If you can do RM role it would be great. Best Lewis On Sat, Sep 15, 2012 at 6:07 PM, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: +1 I'd be happy to help! Cheers, Chris On Sep 15, 2012, at 9:24 AM, Lewis John Mcgibbney wrote: Hi Everyone, Without me slevering

Re: Nutch talk accepted at ApacheCon Europe

2012-09-13 Thread Mattmann, Chris A (388J)
Great to hear, Julien, nice! Cheers, Chris On Sep 13, 2012, at 3:39 AM, Julien Nioche wrote: Hi, I'd just like to mention that I will be giving a talk about Nutch at the Apache Conference Europe (Sinsheim, Germany 5–8 November 2012). The Apache Conference should be a good opportunity

Re: Happy 10th Birthday Nutch!

2012-08-22 Thread Mattmann, Chris A (388J)
/viewer.php#/detail/254365383887354210_4414285 http://statigr.am/viewer.php#/detail/254365383887354210_4414285 Cheers, Jérôme On Fri, Aug 10, 2012 at 1:44 AM, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov mailto:chris.a.mattm...@jpl.nasa.gov mailto:chris.a.mattm...@jpl.nasa.gov

Fwd: Call for Papers for ApacheCon Europe 2012 now open!

2012-07-19 Thread Mattmann, Chris A (388J)
FYI... Begin forwarded message: From: Nick Burch nick.bu...@alfresco.com Date: July 19, 2012 1:14:57 PM CDT To: committ...@apache.org Subject: Call for Papers for ApacheCon Europe 2012 now open! Reply-To: apachecon-disc...@apache.org Hi All We're pleased to announce that the Call for

Re: Apache Nutch being used at National Snow and Ice Data Center: ESIP Federation

2012-07-17 Thread Mattmann, Chris A (388J)
Hi Markus, Great question. I am CC'ing Ruth Duerr and Ian Truslove and Ruth Duerr at NSIDC -- maybe they can provide more information? Ruth, ian, please consider subcribing to dev@nutch.apache.org and/or u...@nutch.apache.org by sending blank emails to: dev-subscr...@nutch.apache.org

Re: [ANNOUNCEMENT] Apache Nutch v1.5.1 Released

2012-07-10 Thread Mattmann, Chris A (388J)
Congrats, all! Cheers, Chris On Jul 10, 2012, at 8:03 AM, Julien Nioche wrote: Great Job Lewis! Thanks a lot On 10 July 2012 15:40, lewis john mcgibbney lewi...@apache.org wrote: Good Afternoon Everyone, The Apache Nutch PMC are very pleased to announce the release of Apache Nutch

Re: [PROPOSAL] Rename branch nutchgora into 2.x

2012-07-09 Thread Mattmann, Chris A (388J)
+1 from me. Cheers, Chris On Jul 9, 2012, at 3:37 AM, Julien Nioche wrote: Guys, Now that we've released 2.0, wouldn't it be better to rename the 'nutchgora' branch into something like 'branch-2.x'? Any thoughts on this? Julien -- Open Source Solutions for Text Engineering

Re: [VOTE] Apache Nutch 1.5.1 RC#3

2012-07-07 Thread Mattmann, Chris A (388J)
Hi Lewis, +1 from me! SIGS check out: [chipotle:~/tmp/nutch-1.5.1] mattmann% $HOME/bin/verify_md5_checksums md5sum: stat '*.bz2': No such file or directory apache-nutch-1.5.1-bin.tar.gz: OK apache-nutch-1.5.1-src.tar.gz: OK apache-nutch-1.5.1-bin.zip: OK apache-nutch-1.5.1-src.zip: OK

Re: [VOTE] Apache Nutch 2.0 Release Candidate #3

2012-07-07 Thread Mattmann, Chris A (388J)
Thanks for your hard work here, Lewis! Cheers, Chris On Jul 7, 2012, at 3:44 PM, Lewis John Mcgibbney wrote: Hi Julien, Believe it or not I've just spent around 45 mins waiting on committing the site... broadband in Paris is nothing short of utterly abysmal to say the very best. Please

Re: [VOTE] Apache Nutch 2.0 Release Candidate #3

2012-07-06 Thread Mattmann, Chris A (388J)
, at 11:24 AM, Mattmann, Chris A (388J) wrote: Hey Lewis, I was running ant test -- sorry -- will try ant runtime now (any idea what's up with test?) Cheers, Chris On Jul 3, 2012, at 11:11 AM, Lewis John Mcgibbney wrote: What commands are you using? I just grabbed the src-tar.gz

Re: [VOTE] Apache Nutch 2.0 Release Candidate #3

2012-07-04 Thread Mattmann, Chris A (388J)
does not exist! Build failed Lewis On Wed, Jul 4, 2012 at 7:18 AM, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: Hi Lewis, Odd, I don't get that. I'll try futzing around again with it tomorrow -- what system are you on? What is your Ant version and Java version

Re: [VOTE] Apache Nutch 2.0 Release Candidate #3

2012-07-03 Thread Mattmann, Chris A (388J)
Hey Julien, On Jul 3, 2012, at 7:49 AM, Julien Nioche wrote: [..snip..] OK, so basically signatures and checksums are fine +1, yep they are great. Tried to build and test and got this: [ivy:resolve] :: [..snip...] Try

Re: [VOTE] Apache Nutch 2.0 Release Candidate #3

2012-07-03 Thread Mattmann, Chris A (388J)
Hey Julien, I ran this command: rm -rf /Users/mattmann/.ivy2/ But it still failed with the below messages: [ivy:resolve] :: problems summary :: [ivy:resolve] WARNINGS [ivy:resolve] [FAILED ] org.apache.hadoop#hadoop-core;1.0.3!hadoop-core.jar: invalid sha1:

Re: [VOTE] Apache Nutch 2.0 Release Candidate #3

2012-07-02 Thread Mattmann, Chris A (388J)
I'll try to scope this by tomorrow...thanks Lewis. Cheers, Chris On Jul 2, 2012, at 10:49 AM, Lewis John Mcgibbney wrote: Anyone else for this RC? I've been slighyl distracted with a number of things recently and only just getting round to following this one up so apologies about that.

Re: 1.5.1 release

2012-06-22 Thread Mattmann, Chris A (388J)
Hey Guys, (sorry for the top post) There's no reason to freeze trunk during releases. In fact, during the RC, once the branch (or tag for that matter) is created, trunk can continue on, no need to stop. Heck, we can always just tag or branch from a specific revision too so it's not really a

Re: Nutch 1.5 Deploy Mode Doesn't Work like Nutch 1.4 Deploy Mode

2012-06-19 Thread Mattmann, Chris A (388J)
+1! Cheers, Chris On Jun 19, 2012, at 2:26 AM, Julien Nioche wrote: Quite annoying that we did not spot this before releasing. What about a 1.5.1 soonish with this fix + couple smallish improvements e.g. upgrade to Hadoop 1.0.3? J. -- Forwarded message -- From:

Re: VOTE Apache Nutch 2.0 RC1

2012-06-15 Thread Mattmann, Chris A (388J)
with only releasing src. On Thu, Jun 14, 2012 at 11:32 PM, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: Or just not ship a bin release at all. Src is the only thing we really VOTE on legally though bin is provided for convenience purposes. Will type more on this later

Re: VOTE Apache Nutch 2.0 RC1

2012-06-14 Thread Mattmann, Chris A (388J)
Hey Guys, I think the annoyance is probably something folks can live with as they have been waiting for an official release of 2.x for years :) My +1 to roll RC #2 with or without a solution to this and mark it as a TODO. release eary, release often :) Cheers, Chris On Jun 14, 2012, at 10:04

Re: VOTE Apache Nutch 2.0 RC1

2012-06-14 Thread Mattmann, Chris A (388J)
On 14 June 2012 20:27, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.govmailto:chris.a.mattm...@jpl.nasa.gov wrote: Hey Guys, I think the annoyance is probably something folks can live with as they have been waiting for an official release of 2.x for years :) My +1 to roll RC #2

Re: Suitable Nutch 2.0 Project Description

2012-06-13 Thread Mattmann, Chris A (388J)
+1 to the description w/o experimental too (I agree with Ferdy). You guys ROCK. Cheers, Chris On Jun 13, 2012, at 5:29 AM, Lewis John Mcgibbney wrote: Hi, Seeing as we have the ball rolling with the 2.0 RC. I thought I'd ask about a suitable project descriptor. So far on trunk we have

Re: VOTE Apache Nutch 2.0 RC1

2012-06-12 Thread Mattmann, Chris A (388J)
Hey Lewis, I will get to this tonight, for sure. Thanks! Cheers, Chris On Jun 12, 2012, at 1:16 PM, Lewis John Mcgibbney wrote: Hi Everyone, I appreciate that most of the core dev's are using trunk, however I would appeal to you guys to at least check out the artifacts and check sigs,

Re: VOTE Apache Nutch 2.0 RC1

2012-06-12 Thread Mattmann, Chris A (388J)
Hey Guys, #2 is probably reason enough for a respin. Lewis if you don't have time to do it before Thursday, I could probably give it a whack. Let me know. Cheers, Chris On Jun 12, 2012, at 3:33 PM, Sebastian Nagel wrote: Hi Lewis, my first steps with 2.0 (to be continued, still

Re: [VOTE] Apache Nutch 1.5 release-1.5RC4

2012-06-01 Thread Mattmann, Chris A (388J)
Hey Lewis, +1 from me! SIGS check out: [chipotle:nutch-dev/1.5-release/rc4] mattmann% ls apache-nutch-1.5-bin.tar.gz apache-nutch-1.5-bin.zip apache-nutch-1.5-src.tar.gz apache-nutch-1.5-src.zip apache-nutch-1.5-bin.tar.gz.asc apache-nutch-1.5-bin.zip.asc

Re: [VOTE] Apache Nutch release 1.5 RC3

2012-05-31 Thread Mattmann, Chris A (388J)
Hey Guys, Does this warrant a respin, or are you +1 Juls? Cheers, Chris On May 31, 2012, at 1:44 AM, Julien Nioche wrote: Hi Lewis, Minor nitpick : the directory /runtime is not necessary as it is built with ANT. Removing it would massively reduce the size of the archive. Could we fix

Re: [VOTE] Apache Nutch release 1.5 RC3

2012-05-31 Thread Mattmann, Chris A (388J)
2012 15:24, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: Hey Guys, Does this warrant a respin, or are you +1 Juls? Cheers, Chris On May 31, 2012, at 1:44 AM, Julien Nioche wrote: Hi Lewis, Minor nitpick : the directory /runtime is not necessary as it is built

Re: [VOTE] Apache Nutch release 1.5 RC3

2012-05-31 Thread Mattmann, Chris A (388J)
this comply with release policy? Thanks Lewis On Thu, May 31, 2012 at 3:49 PM, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: okey dokey. I will try and take the time to review the RC today. Thanks for pushing this Lewis! Cheers, Chris On May 31, 2012, at 7:36 AM, Julien

Re: 1.5 RC2

2012-05-22 Thread Mattmann, Chris A (388J)
+1 happy for Lewis to try I've been swamped! Sent from my iPhone On May 22, 2012, at 2:16 AM, Julien Nioche lists.digitalpeb...@gmail.commailto:lists.digitalpeb...@gmail.com wrote: Hi Lewis, I am sure that Chris will have no problem with you doing the RC2. Chris? It would be a good thing to

Re: 1.5 RC2

2012-05-22 Thread Mattmann, Chris A (388J)
+1 Sent from my iPhone On May 22, 2012, at 4:43 AM, Lewis John Mcgibbney lewis.mcgibb...@gmail.commailto:lewis.mcgibb...@gmail.com wrote: Hi, As I say, I am able to stick time in tonight to roll this RC, however does anyone have a problem with me rolling the 2.0 RC tonight after the 1.5RC2?

Re: [VOTE] Apache Nutch 1.5 release rc #1

2012-05-09 Thread Mattmann, Chris A (388J)
Hey Julien, On May 9, 2012, at 3:11 AM, Julien Nioche wrote: Hi Chris Any chance you could do a RC2 for the trunk soonish? We've been a bit stuck since mid April and it would be nice to move on. If not I can try and spin a RC myself but it is likely to be hilarious :-) Haha, no worries.

Re: Suitable naming for Nutchgora branch?

2012-04-25 Thread Mattmann, Chris A (388J)
, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: Hi Guys, ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm

Re: NUTCH-1129

2012-04-17 Thread Mattmann, Chris A (388J)
Hey Lewis, On Apr 17, 2012, at 3:35 AM, Lewis John Mcgibbney wrote: 3) We previously discussed implementing the Any23 parser plugin as a tika wrapper, therefore it would look very similar to parse-tika? I think it would be super awesome to add the Any23 parsing functionality as a Tika

Re: [VOTE] Apache Nutch 1.5 release rc #1

2012-04-16 Thread Mattmann, Chris A (388J)
Hi Julien, On Apr 16, 2012, at 2:02 AM, Julien Nioche wrote: Thanks Chris, -1 the versions of the deps for hadoop, tika and possibly others are not correct in the pom.xml found in the src archive and on the mvn repository, which will be a problem for whoever tries to use the pom.xml

Re: [VOTE] Apache Nutch 1.5 release rc #1

2012-04-16 Thread Mattmann, Chris A (388J)
Hey Sami, Thanks. I'll fix the 4 license headers you mention below as part of RC #2. Cheers, Chris On Apr 16, 2012, at 3:02 AM, Sami Siren wrote: On Mon, Apr 16, 2012 at 8:43 AM, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: Hi Folks, A candidate for the Nutch 1.5 release

Re: [VOTE] Apache Nutch 1.5 release rc #1

2012-04-16 Thread Mattmann, Chris A (388J)
file for RC #2 as you mention below and not sure why the extension was .tar.gz.tar.gz, I'll fix that too. Cheers, Chris On Apr 16, 2012, at 3:12 AM, Lewis John Mcgibbney wrote: Hi Chris, On Mon, Apr 16, 2012 at 6:43 AM, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: Hi

[VOTE] Apache Nutch 1.5 release rc #1

2012-04-15 Thread Mattmann, Chris A (388J)
Hi Folks, A candidate for the Nutch 1.5 release is available at: http://people.apache.org/~mattmann/apache-nutch-1.5/rc1/ The release candidate is a zip and tar.gz archive of the sources in: http://svn.apache.org/repos/asf/nutch/tags/release-1.5/ And a binary build suitable for

Re: Nutch 1.x trunk release

2012-04-10 Thread Mattmann, Chris A (388J)
Julien On 3 April 2012 15:30, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: Thanks Lewis! Cheers, Chris P.S. Hopefully by this weekend... On Apr 3, 2012, at 7:23 AM, Lewis John Mcgibbney wrote: Hi, On Tue, Apr 3, 2012 at 3:12 PM, Markus Jelsma markus.jel

Re: NutchGora release, and Nutch 1.x trunk release

2012-04-03 Thread Mattmann, Chris A (388J)
Hi Markus, On Apr 3, 2012, at 5:50 AM, Markus Jelsma wrote: Cool! Next time i'll ask infra to allow to supress notifications. Chris, will you RM one RC? And if possible list the detailed steps/command in the process in case you don't have to time RM 1.6 when the time comes. The wiki

Re: NutchGora release, and Nutch 1.x trunk release

2012-04-03 Thread Mattmann, Chris A (388J)
Thanks Lewis! Cheers, Chris P.S. Hopefully by this weekend... On Apr 3, 2012, at 7:23 AM, Lewis John Mcgibbney wrote: Hi, On Tue, Apr 3, 2012 at 3:12 PM, Markus Jelsma markus.jel...@openindex.io wrote: Seems fine. Only updating KEYS is no longer necessary. Now sorted. Thanks

NutchGora release, and Nutch 1.x trunk release

2012-03-08 Thread Mattmann, Chris A (388J)
Hey Guys, I've got some cycles this weekend -- anyone up for a 1.5 release off trunk (stable), and a NutchGora branch release? I suggested this before [1] regarding NutchGora. I'm inclined to say let's do the following: 1. NutchGora: apache-nutch-2.0 - release 2.x series based on this branch 2.

Re: NutchGora release, and Nutch 1.x trunk release

2012-03-08 Thread Mattmann, Chris A (388J)
, 2012 at 3:03 PM, Markus Jelsma markus.jel...@openindex.io wrote: +1 1.5 has, again, many fixes and improvements, just as 1.4 had over 1.3. But i'd like to integrate Tika 1.1 after its pending release. Cheers On Thursday 08 March 2012 15:38:15 Mattmann, Chris A (388J) wrote: Hey Guys

Fwd: Google Summer of Code 2012 upcoming

2012-03-04 Thread Mattmann, Chris A (388J)
Guys, FYI...in case anyone is thinking of GSoC, deadlines are approaching. Process is described below... Thanks! Cheers, Chris Begin forwarded message: From: Ulrich Stärk u...@apache.org Date: March 4, 2012 9:01:07 AM PST To: p...@apache.org p...@apache.org Cc: d...@community.apache.org

Fwd: [blog post] Accumulo, Nutch, and Gora

2012-02-28 Thread Mattmann, Chris A (388J)
FYI...awesome! Begin forwarded message: From: Jason Trost jason.tr...@gmail.com Date: February 28, 2012 5:41:23 PM PST To: common-u...@hadoop.apache.org common-u...@hadoop.apache.org Subject: [blog post] Accumulo, Nutch, and Gora Reply-To: common-u...@hadoop.apache.org

Re: [DISCUSS] Nutchgora 2.0 release

2012-02-20 Thread Mattmann, Chris A (388J)
+1 guys. Just let me know when you are ready and I can RM it. Cheers, Chris On Feb 20, 2012, at 8:01 AM, Lewis John Mcgibbney wrote: Hi, Not ignoring Chris' comments, but addressing the points below first, please see comments. On Mon, Feb 20, 2012 at 2:57 PM, Ferdy Galema

Re: [DISCUSS] Nutchgora 2.0 release

2012-02-18 Thread Mattmann, Chris A (388J)
Hey Lewis, I'd be +1 to roll a Nutchgora 2.0 release. I could see dealing with this in two ways, neither of which I like better than the other: 1. Release the nutchgora branch as apache-nutch-2.0, and then nutchgora becomes the 2.0 branch of the system (and we could create branch-2.0) The 1.x

Fwd: [Announce] Google Summer of Code 2012

2012-02-05 Thread Mattmann, Chris A (388J)
Any Nutch Devs interested in a GSoC student? Begin forwarded message: From: Luciano Resende luckbr1...@gmail.com Date: February 4, 2012 10:40:03 AM PST To: d...@community.apache.org d...@community.apache.org, code-awards code-awa...@apache.org Subject: Fwd: [Announce] Google Summer of Code

Fwd: [Announce] Google Summer of Code 2012

2012-02-05 Thread Mattmann, Chris A (388J)
FYI Begin forwarded message: From: Ross Gardler rgard...@opendirective.com Date: February 5, 2012 1:45:18 PM PST To: d...@community.apache.org d...@community.apache.org Subject: RE: [Announce] Google Summer of Code 2012 Reply-To: d...@community.apache.org d...@community.apache.org For

Re: % of different content types out there on the web

2012-01-31 Thread Mattmann, Chris A (388J)
, we also explicitly filter out all/most unwanted suffixes. We do have a lot of suffixes that we encountered so far. On Saturday 28 January 2012 03:01:26 Mattmann, Chris A (388J) wrote: (sorry for the cross post) Hey Guys, I'm trying to find a good citation or estimate (if anyone has

Re: [DISCUSS] Issues with Fetcher

2012-01-21 Thread Mattmann, Chris A (388J)
Hi Ken, On Jan 21, 2012, at 10:33 AM, Ken Krugler wrote: My own personal favorite area would be to integrate with crawler-commons. +1. Would you crawler-commons guys be interested in bringing that code to Apache? How about bringing it over to Nutch? Would that be something you'd be

Re: [jira] [Commented] (NUTCH-1237) Improve javac arguements for more verbose output

2012-01-06 Thread Mattmann, Chris A (388J)
Yay, all I heard was that it's building again woo hoo! On Jan 6, 2012, at 9:03 AM, Markus Jelsma wrote: Ah, i get 88 warnings now but things build fine. This is indeed quite more verbose :) On Tuesday 27 December 2011 17:28:31 Lewis John McGibbney (Commented) (JIRA) wrote: [

Re: Build failed in Jenkins: Nutch-trunk #1702

2011-12-25 Thread Mattmann, Chris A (388J)
Merry Christmas buddy! Cheers, Chris On Dec 25, 2011, at 9:14 AM, Lewis John Mcgibbney wrote: Hi Guys, Our trunk builds have been broken since migrating to new Hadoop 0.20.2 and migrating CrawlDBScanner to new MR API e.g. trunk build [1] 1698. Looking to the stack trace, I'm assuming that

Re: get rid of outlink code for Tika

2011-12-21 Thread Mattmann, Chris A (388J)
+1 from me -- those 3 Tika content handlers should take care of it... Cheers, Chris On Dec 21, 2011, at 6:51 AM, Markus Jelsma wrote: Hi, For using Boilerpipe we need LinkCH, BoilerpipeCH and TeeCH in Tika. LinkCH returns all URL's with some meta data such as title etc. Fixes for old

Re: Improving API Java Documentation

2011-12-12 Thread Mattmann, Chris A (388J)
Hi Lewis, +1 from me to the update and to logging a JIRA issue. Always nice to see an associated changelog entry for any (even non trivial) updates, short of typos and error corrections in docs/etc. Up to you though, since you're the one doing the work :-) Cheers, Chris On Dec 12, 2011, at

Re: Best way to get files out of segment directories

2011-11-30 Thread Mattmann, Chris A (388J)
Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov schreef: OK, of course, I figured it out, and updated my program :-) You can see it on Github below. I'm going to clean up and generalize this program because I think it's of general use. I'll create an issue shortly. I'm thinking

Best way to get files out of segment directories

2011-11-28 Thread Mattmann, Chris A (388J)
Hey Guys, So, I've completed my crawl of the vault.fbi.gov website for my class that I'm preparing for. I've got: [chipotle:local/nutch/framework] mattmann% du -hs crawl 28Gcrawl [chipotle:local/nutch/framework] mattmann% [chipotle:local/nutch/framework] mattmann% ls -l crawl/segments/

Re: Best way to get files out of segment directories

2011-11-28 Thread Mattmann, Chris A (388J)
have to M/R this. Just wanted to let you guys know where I'm at, and what I've been trying. Thanks, Chris On Nov 28, 2011, at 7:23 PM, Mattmann, Chris A (388J) wrote: Hey Guys, So, I've completed my crawl of the vault.fbi.gov website for my class that I'm preparing for. I've got

Re: Best way to get files out of segment directories

2011-11-28 Thread Mattmann, Chris A (388J)
files that were downloaded by Nutch. Do you guys see this as a useful tool? If so, I'll contribute it this week for 1.5. Cheers, Chris On Nov 28, 2011, at 7:32 PM, Mattmann, Chris A (388J) wrote: Hey Guys, One more thing. Just to let you know I've followed this blog here: http

[RESULT] [VOTE] Apache Nutch 1.4 release rc #2

2011-11-26 Thread Mattmann, Chris A (388J)
Hi Everyone, This VOTE has passed: +1 PMC Julien Nioche Markus Jelsma Lewis John McGibbney Chris Mattmann I'll go ahead and update the website and push the release out to the mirrors. Thanks for VOTE'ing and for your patience! Cheers, Chris

[ANNOUNCE] Apache Nutch 1.4 released

2011-11-26 Thread Mattmann, Chris A (388J)
(...apologies for the cross posting...) The Apache Nutch project is pleased to announce the release of Apache Nutch 1.4. The release contents have been pushed out to the main Apache release site so the releases should be available as soon as the mirrors get the syncs. Apache Nutch is an

Re: 2 things I noticed that I will file JIRA issues + fix

2011-11-25 Thread Mattmann, Chris A (388J)
Mattmann, Chris A (388J) wrote: Hi Markus, On Nov 24, 2011, at 12:03 PM, Markus Jelsma wrote: So, what's the point of that initial if(...) block outside of the for loop. Isn't it redundant? This is trunk? I've been and still am working on some issues for a new feature in this part

2 things I noticed that I will file JIRA issues + fix

2011-11-24 Thread Mattmann, Chris A (388J)
...after I get back from Thanksgiving dinner :-) 1. In URLFilterChecker, the cmd line tool requires URLs to be fed into it on STDIN, but that isn't documented anywhere, even in the tool help printed to STDOUT. I'll fix that. 2. In ParseOutputFormat, I see a code block: {code} //

Re: Dependency Injection

2011-11-22 Thread Mattmann, Chris A (388J)
Hey PJ, On Nov 22, 2011, at 10:47 AM, PJ Herring wrote: Hey Chris, Thanks for the response. I looked at the documents you sent me, and I really do think incorporating some kind of DI Framework could be a great addition to Nutch. I have a general plan of attack, but I'll try to write

Re: Dependency Injection

2011-11-21 Thread Mattmann, Chris A (388J)
Hey PJ, You aren't being an ass at all. You're asking an important question, and something I've been interested in for a while. Here are some relevant threads to take a look at: http://wiki.apache.org/nutch/Nutch2Architecture

Re: 0.2-SNAPSHOT now on apache repository

2011-11-19 Thread Mattmann, Chris A (388J)
+1 from me, Lewis, great work. Cheers, Chris On Nov 19, 2011, at 4:11 AM, Lewis John Mcgibbney wrote: Hi, Please see here [1], and associated issue logged in Nucth Jira [2]. As I explain in the issue, although Gora trunk is not stable there is ongoing work to fix this. Thanks for now

Re: Lewis John McGibbney sent a message via SimilarPages – A web discovery and search add-on

2011-11-17 Thread Mattmann, Chris A (388J)
Awesome news, great to hear! Cheers, Chris On Nov 17, 2011, at 8:57 AM, Lewis John Mcgibbney wrote: Hi, Some more positives here. Lewis -- Forwarded message -- From: Pietro Borradori pietro.borrad...@similarpages.com Date: Thu, Nov 17, 2011 at 4:46 PM Subject: Fw:

Re: [VOTE] Apache Nutch 1.4 release rc #1

2011-11-16 Thread Mattmann, Chris A (388J)
Hadoop _SUCCESS file (markus) Thanks Julien On 9 November 2011 10:21, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: Hi Julien, Thanks. OK, so I will respin an RC for 1.4 that fixes the naming screw up. I already created the KEYS file so we're fine

Re: Community Comments

2011-11-15 Thread Mattmann, Chris A (388J)
+1 to the GUI comment, even though I haven't made one yet, it's definitely on my list of items should I find the cycles to do more besides releasing. Thanks! Cheers, Chris On Nov 15, 2011, at 1:01 PM, Markus Jelsma wrote: Hi Guys, During ApacheCon I made a point of trying to gauge how

Re: Update to release information tutorial

2011-11-15 Thread Mattmann, Chris A (388J)
WOOT! Lewis and I talked about updating this at ApacheCon NA and I sent him the OODT release guide and he's done a masterful job updating ours. Thanks Lewis you rock man. Cheers, Chris On Nov 15, 2011, at 1:56 PM, Lewis John Mcgibbney wrote: Hi guys, Please see here [1] for my attempt

  1   2   >