Andrzej, the feature that I am after can be implemented by this patch if I
just adapt it right. I am not sure of this but the patch seems a little bit
old to be implemented in the latest release of Nutch 0.8.1. 

I want to implement a feature where the fetcher will fetch files but only
add them if there have been modified after the latest fetch time. Now, I
want to implement that on a filesystem first and then update later for
network fetching. I would like to have a look at your full source code for
your patch in a zip file if possible. Once the feature implemented, I will
post it back here. I'd like to start working from your code first. You can
either make the source code available here or mail them to me at armel dot
nene @ idna-solutions dot com.


-----Original Message-----
From: Andrzej Bialecki (JIRA) [mailto:[EMAIL PROTECTED] 
Sent: 12 November 2006 19:39
To: [email protected]
Subject: [jira] Commented: (NUTCH-61) Adaptive re-fetch interval. Detecting
umodified content

    [
http://issues.apache.org/jira/browse/NUTCH-61?page=comments#action_12449170
] 
            
Andrzej Bialecki  commented on NUTCH-61:
----------------------------------------

Unfortunately, this patch hasn't been applied yet, due to its complexity and
lack of testing.

But it will be, sooner or later, because this functionality is required for
any serious use.

I'm planning to bring this patch to the latest trunk, and then apply it
piece-wise over the next couple of weeks.

> Adaptive re-fetch interval. Detecting umodified content
> -------------------------------------------------------
>
>                 Key: NUTCH-61
>                 URL: http://issues.apache.org/jira/browse/NUTCH-61
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher
>            Reporter: Andrzej Bialecki 
>         Assigned To: Andrzej Bialecki 
>         Attachments: 20050606.diff, 20051230.txt, 20060227.txt,
nutch-61-417287.patch
>
>
> Currently Nutch doesn't adjust automatically its re-fetch period, no
matter if individual pages change seldom or frequently. The goal of these
changes is to extend the current codebase to support various possible
adjustments to re-fetch times and intervals, and specifically a re-fetch
schedule which tries to adapt the period between consecutive fetches to the
period of content changes.
> Also, these patches implement checking if the content has changed since
last fetching; protocol plugins are also changed to make use of this
information, so that if content is unmodified it doesn't have to be fetched
and processed.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        


Reply via email to