[
https://issues.apache.org/jira/browse/NUTCH-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14321408#comment-14321408
]
Sebastian Nagel commented on NUTCH-1921:
----------------------------------------
looks good, [~markus17]!
- the issue title could be misunderstood -- it's about fetching and the HTTP
protocol although it affects parsing (or how segments are handled). Maybe a
title "Optionally disable HTTP if-modified-since header" fits better
- property {{http.enable.if.modified.since.header}} should be explained in
nutch-default.xml: the impact on parse filters (etc.) could be explained here
> Optionally parse fetch_not_modified
> -----------------------------------
>
> Key: NUTCH-1921
> URL: https://issues.apache.org/jira/browse/NUTCH-1921
> Project: Nutch
> Issue Type: Bug
> Components: fetcher
> Affects Versions: 1.9
> Reporter: Markus Jelsma
> Assignee: Markus Jelsma
> Fix For: 1.11
>
> Attachments: NUTCH-1921-trunk.patch
>
>
> Records with fetch_not_modified are not parsed and are not passed through
> parse filters, index filters and are not being indexed. This is a huge
> problem if you modified parser filter, indexing filter or whatever behaviour
> in the pipe line because changes never show up in the index.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)