[ 
https://issues.apache.org/jira/browse/NUTCH-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14321408#comment-14321408
 ] 

Sebastian Nagel commented on NUTCH-1921:
----------------------------------------

looks good, [~markus17]!
- the issue title could be misunderstood -- it's about fetching and the HTTP 
protocol although it affects parsing (or how segments are handled). Maybe a 
title "Optionally disable HTTP if-modified-since header" fits better
- property {{http.enable.if.modified.since.header}} should be explained in 
nutch-default.xml: the impact on parse filters (etc.) could be explained here

> Optionally parse fetch_not_modified
> -----------------------------------
>
>                 Key: NUTCH-1921
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1921
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 1.9
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.11
>
>         Attachments: NUTCH-1921-trunk.patch
>
>
> Records with fetch_not_modified are not parsed and are not passed through 
> parse filters, index filters and are not being indexed. This is a huge 
> problem if you modified parser filter, indexing filter or whatever behaviour 
> in the pipe line because changes never show up in the index.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to