Sebastian Nagel created NUTCH-2579:
--------------------------------------
Summary: Fetcher to use parsed URL to call
ProtocolFactory.getProtocol(url)
Key: NUTCH-2579
URL: https://issues.apache.org/jira/browse/NUTCH-2579
Project: Nutch
Issue Type: Improvement
Components: fetcher, protocol
Affects Versions: 1.14
Reporter: Sebastian Nagel
Fix For: 1.15
The call of ProtocolFactory.getProtocol(url) is synchronized and causes waits
for the lock in a multi-threaded fetcher. It uses the URL string, although it
would be more efficient to use the parsed URL hold in the FetchItem. The lock
could be released faster. In addition, parsing the URL also causes a lock in
the URL stream handler:
{noformat}
"FetcherThread" #37 daemon prio=5 os_prio=0 tid=0x00007f21edea2000 nid=0x5c20
waiting for monitor entry [0x00007f21bacb4000]
java.lang.Thread.State: BLOCKED (on object monitor)
at java.util.Hashtable.get(Hashtable.java:363)
- waiting to lock <0x00000005e01b5840> (a java.util.Hashtable)
at java.net.URL.getURLStreamHandler(URL.java:1135)
at java.net.URL.<init>(URL.java:599)
at java.net.URL.<init>(URL.java:490)
at java.net.URL.<init>(URL.java:439)
at
org.apache.nutch.protocol.ProtocolFactory.getProtocol(ProtocolFactory.java:74)
- locked <0x00000005fc5f4fb8> (a
org.apache.nutch.protocol.ProtocolFactory)
at org.apache.nutch.fetcher.FetcherThread.run(FetcherThread.java:299)
{noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)