[
https://issues.apache.org/jira/browse/NUTCH-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Work on NUTCH-2579 started by Sebastian Nagel.
----------------------------------------------
> Fetcher to use parsed URL to call ProtocolFactory.getProtocol(url)
> ------------------------------------------------------------------
>
> Key: NUTCH-2579
> URL: https://issues.apache.org/jira/browse/NUTCH-2579
> Project: Nutch
> Issue Type: Improvement
> Components: fetcher, protocol
> Affects Versions: 1.14
> Reporter: Sebastian Nagel
> Assignee: Sebastian Nagel
> Priority: Minor
> Fix For: 1.15
>
>
> The call of ProtocolFactory.getProtocol(url) is synchronized and causes waits
> for the lock in a multi-threaded fetcher. It uses the URL string, although it
> would be more efficient to use the parsed URL hold in the FetchItem. The lock
> could be released faster. In addition, parsing the URL also causes a lock in
> the URL stream handler:
> {noformat}
> "FetcherThread" #37 daemon prio=5 os_prio=0 tid=0x00007f21edea2000 nid=0x5c20
> waiting for monitor entry [0x00007f21bacb4000]
> java.lang.Thread.State: BLOCKED (on object monitor)
> at java.util.Hashtable.get(Hashtable.java:363)
> - waiting to lock <0x00000005e01b5840> (a java.util.Hashtable)
> at java.net.URL.getURLStreamHandler(URL.java:1135)
> at java.net.URL.<init>(URL.java:599)
> at java.net.URL.<init>(URL.java:490)
> at java.net.URL.<init>(URL.java:439)
> at
> org.apache.nutch.protocol.ProtocolFactory.getProtocol(ProtocolFactory.java:74)
> - locked <0x00000005fc5f4fb8> (a
> org.apache.nutch.protocol.ProtocolFactory)
> at org.apache.nutch.fetcher.FetcherThread.run(FetcherThread.java:299)
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)