[
https://issues.apache.org/jira/browse/NUTCH-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265750#comment-13265750
]
behnam nikbakht commented on NUTCH-1347:
----------------------------------------
i can not recognize your solution.
when i simply put a line in getFetchItem() method in FetchItemQueue class, see
that there are impoliteness requests to same host:
try {
it = queue.remove(0);
inProgress.add(it);
+System.out.println(it.url.toString()+"<<"+System.currentTimeMillis());
we can multiply minCrawlDelay or crawlDelay and maxThreads with number of map
tasks but there is no coordination between tasks and also there are not equal
number of url from each host for each task.
also i found a bug in selector reduce task in generate phase, that result from
less of coordination between tasks.
for these problems i use a redis-server that is a fast data server for
manintaining (key,value) pairs.
so, redis maintain some variables like delay, maxThreads,... for each host and
can dynamically set them acording to rate of success and block for each host.
> fetcher politeness related to map-reduce
> ----------------------------------------
>
> Key: NUTCH-1347
> URL: https://issues.apache.org/jira/browse/NUTCH-1347
> Project: Nutch
> Issue Type: Improvement
> Components: fetcher
> Affects Versions: 1.4
> Reporter: behnam nikbakht
> Labels: fetch
>
> when Nutch is running on Hadoop , based on map-reduce concept, each map task
> do some thing on it's owned data, so, each fetcher map-task work with it's
> Queues and do not know any thing about other Queus. so, enforce delay between
> successive requests and maximum concurrent requests policies on it's Queues.
> but with a simple test we found that it's not good piliteness mechanism when
> we have multiple map tasks.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira