[ 
https://issues.apache.org/jira/browse/NUTCH-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on NUTCH-2579 started by Sebastian Nagel.
----------------------------------------------
> Fetcher to use parsed URL to call ProtocolFactory.getProtocol(url)
> ------------------------------------------------------------------
>
>                 Key: NUTCH-2579
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2579
>             Project: Nutch
>          Issue Type: Improvement
>          Components: fetcher, protocol
>    Affects Versions: 1.14
>            Reporter: Sebastian Nagel
>            Assignee: Sebastian Nagel
>            Priority: Minor
>             Fix For: 1.15
>
>
> The call of ProtocolFactory.getProtocol(url) is synchronized and causes waits 
> for the lock in a multi-threaded fetcher. It uses the URL string, although it 
> would be more efficient to use the parsed URL hold in the FetchItem. The lock 
> could be released faster. In addition, parsing the URL also causes a lock in 
> the URL stream handler:
> {noformat}
> "FetcherThread" #37 daemon prio=5 os_prio=0 tid=0x00007f21edea2000 nid=0x5c20 
> waiting for monitor entry [0x00007f21bacb4000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>         at java.util.Hashtable.get(Hashtable.java:363)
>         - waiting to lock <0x00000005e01b5840> (a java.util.Hashtable)
>         at java.net.URL.getURLStreamHandler(URL.java:1135)
>         at java.net.URL.<init>(URL.java:599)
>         at java.net.URL.<init>(URL.java:490)
>         at java.net.URL.<init>(URL.java:439)
>         at 
> org.apache.nutch.protocol.ProtocolFactory.getProtocol(ProtocolFactory.java:74)
>         - locked <0x00000005fc5f4fb8> (a 
> org.apache.nutch.protocol.ProtocolFactory)
>         at org.apache.nutch.fetcher.FetcherThread.run(FetcherThread.java:299)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to