Massimo Miccoli wrote:
Any news about integration of OPIC in mapred? I have time to develop OPIC on Nutch Mapred. Can you help me to start? By the email from Carlos Alberto-Alejandro CASTILLO-Ocaranza, seams that the best way to integrate OPIC in on old webdb, is this way valid also
CrawlDb in Mapred?

Yes.  I think the way to implement this in the mapred branch is:

[snip]

Just for grins, I modified Nutch 0.7 to use OPIC. It was a quick hack, where I stuffed the OPIC score in a page's nextScore field, added to this value when processing a page's outlinks, and then used it when ranking links in the FetchListTool.

Seems to be working well, though without a well-constrained crawl environment it's hard to come up with quantitative results. At least we no longer spend a disproportionate amount of our crawl time on some sites (like about.com) that wind up with lots of in-bound links.

Note that our usage is also a bit non-standard in that we're doing a vertical crawl, and have a way of scoring page contents at crawl time. So we use this in combination with the OPIC score as the page score that we divide up among the outbound links.

-- Ken
--
Ken Krugler
Krugle, Inc.
+1 530-470-9200

Reply via email to