NUTCH-2300 Fetcher to optionally save robots.txt Merge branch 'SaveRobotsTxt' of https://github.com/sebastian-nagel/nutch, this closes #141
Project: http://git-wip-us.apache.org/repos/asf/nutch/repo Commit: http://git-wip-us.apache.org/repos/asf/nutch/commit/3fca1a59 Tree: http://git-wip-us.apache.org/repos/asf/nutch/tree/3fca1a59 Diff: http://git-wip-us.apache.org/repos/asf/nutch/diff/3fca1a59 Branch: refs/heads/master Commit: 3fca1a5902a151867733806fc0511f18ab0b4e6f Parents: d37b7ce f3af9a5 Author: Sebastian Nagel <sna...@apache.org> Authored: Mon Aug 22 23:50:16 2016 +0200 Committer: Sebastian Nagel <sna...@apache.org> Committed: Mon Aug 22 23:50:16 2016 +0200 ---------------------------------------------------------------------- conf/nutch-default.xml | 10 ++ .../org/apache/nutch/fetcher/FetcherThread.java | 29 +++- .../org/apache/nutch/parse/ParseSegment.java | 11 +- .../org/apache/nutch/protocol/Protocol.java | 20 ++- .../apache/nutch/protocol/RobotRulesParser.java | 174 +++++++++++++++---- .../nutch/protocol/http/api/HttpBase.java | 29 ++-- .../protocol/http/api/HttpRobotRulesParser.java | 52 +++++- .../org/apache/nutch/protocol/file/File.java | 13 +- .../java/org/apache/nutch/protocol/ftp/Ftp.java | 9 +- .../nutch/protocol/ftp/FtpRobotRulesParser.java | 17 +- 10 files changed, 286 insertions(+), 78 deletions(-) ----------------------------------------------------------------------