[
https://issues.apache.org/jira/browse/NUTCH-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney updated NUTCH-1419:
----------------------------------------
Attachment: NUTCH-1419-2.x.patch
NUTCH-1419-trunk.patch
updated patch for trunk which accommodates the changes made to the codebase
since Seb initially uploaded his patch.
Also uploaded patch for 2.x with addition of .isSuccess method to keep
consistency with trunk.
Please review and comment.
Thank you
> parsechecker and indexchecker to report protocol status
> -------------------------------------------------------
>
> Key: NUTCH-1419
> URL: https://issues.apache.org/jira/browse/NUTCH-1419
> Project: Nutch
> Issue Type: Improvement
> Components: indexer, parser
> Affects Versions: nutchgora, 1.6
> Reporter: Sebastian Nagel
> Priority: Minor
> Attachments: NUTCH-1419-1.patch, NUTCH-1419-2.x.patch,
> NUTCH-1419-trunk.patch
>
>
> Parsechecker and indexchecker should report the protocol status when the
> fetch was not successful (status other than 200/ok).
> In case of a redirect, the protocol status contains the URL a redirect points
> to. Usually, this URL should be checked instead of the original one which is
> not indexed. The content of a redirect response is less useful (and often
> empty):
> {code}
> % nutch indexchecker http://lucene.apache.org/nutch/
> fetching: http://lucene.apache.org/nutch/
> parsing: http://lucene.apache.org/nutch/
> contentType: text/html
> content : 301 Moved Permanently Moved Permanently The document has
> moved here . Apache/2.4.1 (Unix) OpenSSL/1.
> title : 301 Moved Permanently
> host : lucene.apache.org
> tstamp : Tue Jul 03 13:27:32 CEST 2012
> url : http://lucene.apache.org/nutch/
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira