Sebastian Nagel created NUTCH-2579:
--------------------------------------

             Summary: Fetcher to use parsed URL to call 
ProtocolFactory.getProtocol(url)
                 Key: NUTCH-2579
                 URL: https://issues.apache.org/jira/browse/NUTCH-2579
             Project: Nutch
          Issue Type: Improvement
          Components: fetcher, protocol
    Affects Versions: 1.14
            Reporter: Sebastian Nagel
             Fix For: 1.15


The call of ProtocolFactory.getProtocol(url) is synchronized and causes waits 
for the lock in a multi-threaded fetcher. It uses the URL string, although it 
would be more efficient to use the parsed URL hold in the FetchItem. The lock 
could be released faster. In addition, parsing the URL also causes a lock in 
the URL stream handler:
{noformat}
"FetcherThread" #37 daemon prio=5 os_prio=0 tid=0x00007f21edea2000 nid=0x5c20 
waiting for monitor entry [0x00007f21bacb4000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at java.util.Hashtable.get(Hashtable.java:363)
        - waiting to lock <0x00000005e01b5840> (a java.util.Hashtable)
        at java.net.URL.getURLStreamHandler(URL.java:1135)
        at java.net.URL.<init>(URL.java:599)
        at java.net.URL.<init>(URL.java:490)
        at java.net.URL.<init>(URL.java:439)
        at 
org.apache.nutch.protocol.ProtocolFactory.getProtocol(ProtocolFactory.java:74)
        - locked <0x00000005fc5f4fb8> (a 
org.apache.nutch.protocol.ProtocolFactory)
        at org.apache.nutch.fetcher.FetcherThread.run(FetcherThread.java:299)
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to