[ 
https://issues.apache.org/jira/browse/NUTCH-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090115#comment-13090115
 ] 

Julien Nioche commented on NUTCH-1024:
--------------------------------------

There is a JIRA issue for 2.0 https://issues.apache.org/jira/browse/NUTCH-882, 
but I'd like to do it in 1.4

We've talked about processing sitemaps on the mailing lists for some time and 
now have crawler-commons to help us with the parsing. Entries in sitemaps have 
some info about how frequently they are likely to be modified so it is somewhat 
related to this issue.

> Dynamically set fetchInterval by MIME-type
> ------------------------------------------
>
>                 Key: NUTCH-1024
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1024
>             Project: Nutch
>          Issue Type: New Feature
>          Components: generator
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Minor
>             Fix For: 1.4
>
>         Attachments: AdaptiveFetchSchedule.patch, 
> MimeAdaptiveFetchSchedule.java, Nutch.patch, adaptive-mimetypes.txt
>
>
> Add facility to configure default or fixed fetchInterval values by MIME-type. 
> This is useful for conserving resources for files that are known to change 
> frequently or never and everything in between.
> * simple key\tvalue\n configuration file
> * only set fetchInterval for new documents
> * keep max fetchInterval fixed by current config

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to