[ http://issues.apache.org/jira/browse/NUTCH-357?page=comments#action_12442079 ] Greg Kim commented on NUTCH-357: --------------------------------
Thanks Sami! It's a great simple framework for system tests > crawling simulation > ------------------- > > Key: NUTCH-357 > URL: http://issues.apache.org/jira/browse/NUTCH-357 > Project: Nutch > Issue Type: Improvement > Affects Versions: 0.8.1, 0.9.0 > Reporter: Stefan Groschupf > Fix For: 0.9.0 > > Attachments: protocol-simulation-pluginV1.patch > > > We recently discovered some serious issue related to crawling and scoring. > Reproducing these problems is a kind of difficult, since first of all it is > not polite to re-crawl a set of pages again and again, secondly it is > difficult to catch the page that cause a problem. > Therefore it would be very useful to have a testbed to simulate crawls where > we can control the response of "web servers". > For the very beginning simulate very basic situation like a page points to it > self, link chains or internal links would already be very usefully. > However later on simulate crawls against existing data collections like TREC > or a webgraph would be much more interesting, for instance to caculate the > quality of the nutch OPIC implementation against page rank scores of the > webgraph or evaluaing crawling strategies. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
