[
https://issues.apache.org/jira/browse/NUTCH-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13808086#comment-13808086
]
Talat UYARER commented on NUTCH-1651:
-------------------------------------
Hi [~amuseme.lu], lastModifiedTime is a fetch parameter. We use for controling
webpage changes in protocol-http. If webpage changed since our last
modification time, page's host return 200 code with page's content. if not
changed page, page's host return 304 code without content. This working style
because of http structure. You can look at in HttpResponse.java in
protocol-http.
{code:title=HttpResponse.java|borderStyle=solid}
if (page.isReadable(WebPage.Field.MODIFIED_TIME.getIndex())) {
reqStr.append("If-Modified-Since: " +
HttpDateFormat.toString(page.getModifiedTime()));
reqStr.append("\r\n");
}
{code}
If we don't set there. We always use first fetctime as modifiedtime. This means
of Nutch will fetch the pages without modification unnecessarily. Moreover I
think if user want to check content changes. it will use signature methods.
> modifiedTime and prevmodifiedTime never set
> --------------------------------------------
>
> Key: NUTCH-1651
> URL: https://issues.apache.org/jira/browse/NUTCH-1651
> Project: Nutch
> Issue Type: Bug
> Affects Versions: 2.2.1
> Reporter: Talat UYARER
> Fix For: 2.3
>
> Attachments: NUTCH-1651.patch
>
>
> modifiedTime is never set. If you use DefaultFetchScheduler, modifiedTime is
> always zero as default. But if you use AdaptiveFetchScheduler, modifiedTime
> is set only once in the beginning by zero-control of AdaptiveFetchScheduler.
> But this is not sufficient since modifiedTime needs to be updated whenever
> last modified time is available. We corrected this with a patch.
> Also we noticed that prevModifiedTime is not written to database and we
> corrected that too.
> With this patch, whenever lastModifiedTime is available, we do two things.
> First we set modifiedTime in the Page object to prevModifiedTime. After that
> we set lastModifiedTime to modifiedTime.
--
This message was sent by Atlassian JIRA
(v6.1#6144)