[ https://issues.apache.org/jira/browse/NUTCH-933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lewis John McGibbney updated NUTCH-933: --------------------------------------- Fix Version/s: 1.7 > Fetcher does not save a pages Last-Modified value in CrawlDatum > --------------------------------------------------------------- > > Key: NUTCH-933 > URL: https://issues.apache.org/jira/browse/NUTCH-933 > Project: Nutch > Issue Type: Bug > Components: fetcher > Affects Versions: 1.2 > Reporter: Joe Kemp > Fix For: 1.7 > > > I added the following code in the output method just after the If (content > !=null) statement. > String lastModified = metadata.get("Last-Modified"); > if (lastModified !=null && !lastModified.equals("")) { > try { > Date lastModifiedDate = > DateUtil.parseDate(lastModified); > > datum.setModifiedTime(lastModifiedDate.getTime()); > } catch (DateParseException e) { > > } > } > I now get 304 for pages that haven't changed when I recrawl. Need to do > further testing. Might also need a configuration parameter to turn off this > behavior, allowing pages to be forced to be refreshed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira