Massimo Miccoli wrote:
Any news about integration of OPIC in mapred? I have time to
develop OPIC on Nutch Mapred. Can you help me to start?
By the email from Carlos Alberto-Alejandro CASTILLO-Ocaranza, seams
that the best way to integrate OPIC in on old webdb, is this way
valid also
CrawlDb in Mapred?
Yes. I think the way to implement this in the mapred branch is:
[snip]
Just for grins, I modified Nutch 0.7 to use OPIC. It was a quick
hack, where I stuffed the OPIC score in a page's nextScore field,
added to this value when processing a page's outlinks, and then used
it when ranking links in the FetchListTool.
Seems to be working well, though without a well-constrained crawl
environment it's hard to come up with quantitative results. At least
we no longer spend a disproportionate amount of our crawl time on
some sites (like about.com) that wind up with lots of in-bound links.
Note that our usage is also a bit non-standard in that we're doing a
vertical crawl, and have a way of scoring page contents at crawl
time. So we use this in combination with the OPIC score as the page
score that we divide up among the outbound links.
-- Ken
--
Ken Krugler
Krugle, Inc.
+1 530-470-9200