Fetcher does not save a pages Last-Modified value in CrawlDatum
---------------------------------------------------------------

                 Key: NUTCH-933
                 URL: https://issues.apache.org/jira/browse/NUTCH-933
             Project: Nutch
          Issue Type: Bug
          Components: fetcher
    Affects Versions: 1.2
            Reporter: Joe Kemp


I added the following code in the output method just after the If (content 
!=null) statement.


        String lastModified = metadata.get("Last-Modified");
        if (lastModified !=null && !lastModified.equals("")) {

                try {
                                Date lastModifiedDate = 
DateUtil.parseDate(lastModified);
                                
datum.setModifiedTime(lastModifiedDate.getTime());
                        } catch (DateParseException e) {
                                
                        }
        }


I now get 304 for pages that haven't changed when I recrawl.  Need to do 
further testing.  Might also need a configuration parameter to turn off this 
behavior, allowing pages to be forced to be refreshed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to