[ http://issues.apache.org/jira/browse/NUTCH-344?page=all ]
Greg Kim updated NUTCH-344:
---------------------------
Affects Version/s: 0.8
> Fetcher threads blocked on synchronized block in cleanExpiredServerBlocks
> -------------------------------------------------------------------------
>
> Key: NUTCH-344
> URL: http://issues.apache.org/jira/browse/NUTCH-344
> Project: Nutch
> Issue Type: Bug
> Components: fetcher
> Affects Versions: 0.8, 0.9.0, 0.8.1
> Environment: All
> Reporter: Greg Kim
> Attachments: cleanExpiredServerBlocks.patch
>
>
> With the recent change to the following code in HttpBase.java has tendencies
> to block fetcher threads while one thread busy waits...
> private static void cleanExpiredServerBlocks() {
> synchronized (BLOCKED_ADDR_TO_TIME) {
> while (!BLOCKED_ADDR_QUEUE.isEmpty()) { <===== LINE 3:
> String host = (String) BLOCKED_ADDR_QUEUE.getLast();
> long time = ((Long) BLOCKED_ADDR_TO_TIME.get(host)).longValue();
> if (time <= System.currentTimeMillis()) {
> BLOCKED_ADDR_TO_TIME.remove(host);
> BLOCKED_ADDR_QUEUE.removeLast();
> }
> }
> }
> }
> LINE3: As long as there are *any* entries in the BLOCKED_ADDR_QUEUE, the
> thread that first enters this block busy-waits until it becomes empty while
> all other threads block on the synchronized block. This leads to extremely
> poor fetcher performance.
> Since the checkin to respect crawlDelay in robots.txt, we are no longer
> guranteed that BLOCKED_ADDR_TO_TIME queue is a fifo list. The simple fix is
> to iterate the queue once rather than busy waiting...
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira