[ 
http://issues.apache.org/jira/browse/NUTCH-357?page=comments#action_12442079 ] 
            
Greg Kim commented on NUTCH-357:
--------------------------------

Thanks Sami!  It's a great simple framework for system tests

> crawling simulation
> -------------------
>
>                 Key: NUTCH-357
>                 URL: http://issues.apache.org/jira/browse/NUTCH-357
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 0.8.1, 0.9.0
>            Reporter: Stefan Groschupf
>             Fix For: 0.9.0
>
>         Attachments: protocol-simulation-pluginV1.patch
>
>
> We recently discovered  some serious issue related to crawling and scoring. 
> Reproducing these problems is a kind of difficult, since first of all it is 
> not polite to re-crawl a set of pages again and again, secondly it is 
> difficult to catch the page that cause a problem. 
> Therefore it would be very useful to have a testbed to simulate crawls where  
> we can control the response of  "web servers". 
> For the very beginning simulate very basic situation like a page points to it 
> self,  link chains or internal links would already be very usefully. 
> However later on simulate crawls against existing data collections like TREC 
> or a webgraph would be much more interesting, for instance to caculate the 
> quality of the nutch OPIC implementation against page rank scores of the 
> webgraph or evaluaing crawling strategies.    

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to