Re: [EXTERNAL] [PROPOSAL] Replace whitelist blacklist with allowlist denylist

2020-06-09 Thread Chris Mattmann
+1 From: lewis john mcgibbney Reply-To: "dev@nutch.apache.org" Date: Tuesday, June 9, 2020 at 3:21 PM To: "dev@nutch.apache.org" Subject: [EXTERNAL] [PROPOSAL] Replace whitelist blacklist with allowlist denylist Hi Folks, What I would like to propose that we replace

Re: Maven vs Gradle for Nutch Build System

2018-11-29 Thread Chris Mattmann
Thamme worked on this…check where he left off… From: lewis john mcgibbney Reply-To: "dev@nutch.apache.org" Date: Thursday, November 29, 2018 at 1:13 PM To: "dev@nutch.apache.org" Subject: Maven vs Gradle for Nutch Build System Hi Folks, Seb and I were talking build systems this

FW: Solr/Nutch /tika config for PDF crawing

2018-10-03 Thread Chris Mattmann
From: bineesh k Date: Wednesday, October 3, 2018 at 12:37 AM To: "dev-ow...@tika.apache.org" Subject: Solr/Nutch /tika config for PDF crawing Hello Tika Team, Need help on Solr/Nutch setup for crawling the PDF pages We are using Nutch 1.15 and Solr 7.3.1 for our setup.

Re: Preparing to release Nutch 1.15 ?

2018-06-11 Thread Chris Mattmann
++1! Sounds great. Cheers, Chris From: Sebastian Nagel Reply-To: "dev@nutch.apache.org" Date: Monday, June 11, 2018 at 7:35 AM To: "u...@nutch.apache.org" Cc: "dev@nutch.apache.org" Subject: Preparing to release Nutch 1.15 ? Hi all, almost 80 fixes and

Re: [VOTE] Release Apache Nutch 1.14 RC#1

2017-12-22 Thread Chris Mattmann
Yay, go Seb, go! On 12/22/17, 8:38 AM, "Sebastian Nagel" wrote: Hi Folks, thanks to everyone who was able to review the release candidate! 72 hours have passed, please see below for vote results. [8] +1 Release this package as Apache

Re: [DISCUSS] Release 1.14?

2017-12-08 Thread Chris Mattmann
+1 this makes sense to me! ( Happy to help test. Cheers, Chris On 12/8/17, 2:53 PM, "Sebastian Nagel" wrote: Hi all, 50+ issues fixed https://issues.apache.org/jira/projects/NUTCH/versions/12340218 Of course, as always and still many

Re: Request for patches review

2017-11-09 Thread Chris Mattmann
Hey Seymon, FWIW, use this for Github contribution guidelines: https://github.com/apache/nutch/#contributing I may have some time this weekend to look at 2441. Thanks, Chris On 11/9/17, 8:34 AM, "Semyon Semyonov" wrote: Dear all, Could you review

Re: Crawler-Commons 0.8 released

2017-06-09 Thread Chris Mattmann
Great job! From: Julien Nioche Reply-To: "dev@nutch.apache.org" Date: Friday, June 9, 2017 at 2:28 AM To: "crawler-comm...@googlegroups.com" , "bixo-...@yahoogroups.com" ,

Re: Impolite crawling using NUTCH

2016-12-02 Thread Chris Mattmann
he.org/nutch/WhiteListRobots ++++++ Chris Mattmann, Ph.D. Principal Data Scientist, Engineering Administrative Office (3010) Manager, Open Source Projects Formulation and Development Office (8212) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 180-503E, Mailstop:

[VOTE] Moving to Git

2016-01-07 Thread Chris Mattmann
Hi Everyone, I proposed this earlier, and we said we’d wait until after the 1.11 release. So it’s time to VOTE to move Nutch to Git. So far, the following people have expressed +1s and if I don’t hear otherwise, I will implicitly count their VOTE from the DISCUSS thread: +1 PMC Chris Mattmann

Re: [VOTE] Moving to Git

2016-01-07 Thread Chris Mattmann
tly count their VOTE from the DISCUSS >thread: > >+1 PMC > >Chris Mattmann* >Sebastien Nagel* >Michael Joyce* >Asitang Mishra* >Dennis Kubes* >BlackIce > >Everyone else (or those above that would like to amend their VOTE), >please VOTE below. I will leave

Re: Review Request 33112: NUTCH-1927: Create a whitelist of IPs/hostnames to allow skipping of RobotRules parsing

2015-04-15 Thread Chris Mattmann
will be ignored allowed:http://baron.pagemewhen.com/~chris/ [chipotle:~/src/nutch] mattmann% Thanks, Chris Mattmann

Re: Review Request 33112: NUTCH-1927: Create a whitelist of IPs/hostnames to allow skipping of RobotRules parsing

2015-04-14 Thread Chris Mattmann
will be ignored allowed:http://baron.pagemewhen.com/~chris/ [chipotle:~/src/nutch] mattmann% Thanks, Chris Mattmann

Review Request 33112: NUTCH-1927: Create a whitelist of IPs/hostnames to allow skipping of RobotRules parsing

2015-04-12 Thread Chris Mattmann
:http://baron.pagemewhen.com/~chris/ [chipotle:~/src/nutch] mattmann% Thanks, Chris Mattmann

Re: Review Request 32451: keyPrefix option for CommonCrawlDataDumper tool

2015-03-27 Thread Chris Mattmann
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32451/#review77856 --- Ship it! Ship It! - Chris Mattmann On March 26, 2015, 2:47 a.m

Re: [nsf-polar-usc-students] ExceptionInInitializerError caused by NPE

2014-11-20 Thread Chris Mattmann
Great, can you attach a patch for this? Chris Mattmann chris.mattm...@gmail.com -Original Message- From: MengYing Wang mengyingwa...@gmail.com Date: Thursday, November 20, 2014 at 7:02 PM To: Lewis John Mcgibbney lewis.mcgibb...@gmail.com Cc: dev

Re: Review Request 9119: Create SegmentContentDumperTool for easily extracting out file contents from SegmentDirs

2014-09-20 Thread Chris Mattmann
. To reply, visit: https://reviews.apache.org/r/9119/#review52796 --- On Sept. 10, 2014, 3:15 a.m., Chris Mattmann wrote: --- This is an automatically generated e-mail. To reply

Re: Review Request 9119: Create SegmentContentDumperTool for easily extracting out file contents from SegmentDirs

2014-09-20 Thread Chris Mattmann
/9119/#review52809 --- On Sept. 10, 2014, 3:15 a.m., Chris Mattmann wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/9119

Re: Review Request 9119: Create SegmentContentDumperTool for easily extracting out file contents from SegmentDirs

2014-09-10 Thread Chris Mattmann
- ./trunk/src/java/org/apache/nutch/tools/FileDumper.java PRE-CREATION Diff: https://reviews.apache.org/r/9119/diff/ Testing --- Testing it on DARPA XDATA XNET. Thanks, Chris Mattmann

Re: Review Request 9119: Create SegmentContentDumperTool for easily extracting out file contents from SegmentDirs

2014-09-10 Thread Chris Mattmann
--- On Sept. 6, 2014, 4:57 a.m., Chris Mattmann wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/9119

Re: Review Request 9119: Create SegmentContentDumperTool for easily extracting out file contents from SegmentDirs

2014-09-05 Thread Chris Mattmann
. Diffs (updated) - ./trunk/src/java/org/apache/nutch/tools/FileDumper.java PRE-CREATION Diff: https://reviews.apache.org/r/9119/diff/ Testing --- Testing it on DARPA XDATA XNET. Thanks, Chris Mattmann

Re: Common Crawl's Move to Apache Nutch

2014-02-21 Thread Chris Mattmann
That is frickin' awesome Juls. You may want to contact Sally (s...@apache.org), ASF VP of Press and Marketing and suggest to her that this deserves a Tweet, at the least. Cheers! Chris -Original Message- From: Julien Nioche lists.digitalpeb...@gmail.com Reply-To: dev@nutch.apache.org

Submission to ApacheCon on Tika

2014-01-30 Thread Chris Mattmann
Hey Guys, I submitted the below talk on Apache Tika, Nutch and Solr to ApacheCon NA 2014: Real Data Science: Exploring the FBI's Vault dataset with Apache Tika, Nutch and Solr Event ApacheCon North America Submission Type Lightning Talk Category Developer Biography Chris Mattmann has a wealth

Re: Alternative to Forrest for Nutch website

2013-10-22 Thread Chris Mattmann
Hey Jul, A lot are using the Apache CMS: http://www.apache.org/dev/cms.html That's infra recommended. Besides that some are using Confluence; some use Maven; others use Markdown via CMS, etc. My +1 would be for the CMS, but I don't have time to set it up (luckily infra can help and we can

Review Request: Create SegmentContentDumperTool for easily extracting out file contents from SegmentDirs

2013-01-27 Thread Chris Mattmann
://reviews.apache.org/r/9119/diff/ Testing --- Testing it on DARPA XDATA XNET. Thanks, Chris Mattmann

[ANNOUNCE] Apache Nutch 1.1 released

2010-06-18 Thread Chris Mattmann
to verify the downloads using signatures found on the Apache site: http://www.apache.org/dist/nutch/KEYS-1.1.txt For more information on Apache Nutch, visit the project home page: http://nutch.apache.org -- Chris Mattmann (on behalf of the Apache Nutch community)