[ 
https://issues.apache.org/jira/browse/NUTCH-882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13262488#comment-13262488
 ] 

Ferdy Galema commented on NUTCH-882:
------------------------------------

Committed. I realize that the current state is far from finished, however I 
figured it is enough to close this longstanding issue off. This makes room for 
people to easily play around with it and make improvements where necessary. 
(Adding definitions for other stores, new features such as storing stats 
etcetera.)

I'll leave the final closing to Julien, since he is the original reporter.

Please let me know if any of you disagree.
                
> Design a Host table in GORA
> ---------------------------
>
>                 Key: NUTCH-882
>                 URL: https://issues.apache.org/jira/browse/NUTCH-882
>             Project: Nutch
>          Issue Type: New Feature
>    Affects Versions: nutchgora
>            Reporter: Julien Nioche
>             Fix For: nutchgora
>
>         Attachments: NUTCH-882-v1.patch, NUTCH-882-v3.txt, NUTCH-882-v3.txt, 
> hostdb.patch
>
>
> Having a separate GORA table for storing information about hosts (and 
> domains?) would be very useful for : 
> * customising the behaviour of the fetching on a host basis e.g. number of 
> threads, min time between threads etc...
> * storing stats
> * keeping metadata and possibly propagate them to the webpages 
> * keeping a copy of the robots.txt and possibly use that later to filter the 
> webtable
> * store sitemaps files and update the webtable accordingly
> I'll try to come up with a GORA schema for such a host table but any comments 
> are of course already welcome 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to