[ 
https://issues.apache.org/jira/browse/CONNECTORS-907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13915737#comment-13915737
 ] 

Karl Wright commented on CONNECTORS-907:
----------------------------------------

Hi Abe-san,

I started the crawl and got a thread dump.  This is what it does while it is 
reading the seed:

{code}
"Thread-855" daemon prio=6 tid=0x0599f400 nid=0x93c4 in Object.wait() 
[0x0702f000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x2a1173a0> (a 
org.apache.manifoldcf.core.throttler.ThrottleBin)
        at 
org.apache.manifoldcf.core.throttler.ThrottleBin.beginRead(ThrottleBin.java:231)
        - locked <0x2a1173a0> (a 
org.apache.manifoldcf.core.throttler.ThrottleBin)
        at 
org.apache.manifoldcf.core.throttler.Throttler$ThrottlingGroup.obtainReadPermission(Throttler.java:807)
        at 
org.apache.manifoldcf.core.throttler.Throttler$StreamThrottler.obtainReadPermission(Throttler.java:1130)
        at 
org.apache.manifoldcf.crawler.connectors.webcrawler.ThrottledFetcher$ThrottledInputstream.basicRead(ThrottledFetcher.java:1141)
        at 
org.apache.manifoldcf.crawler.connectors.webcrawler.ThrottledFetcher$ThrottledInputstream.read(ThrottledFetcher.java:1109)
        at 
org.apache.manifoldcf.core.common.XThreadInputStream.stuffQueue(XThreadInputStream.java:148)
        at 
org.apache.manifoldcf.crawler.connectors.webcrawler.ThrottledFetcher$ExecuteMethodThread.run(ThrottledFetcher.java:1477)
{code}


So byte-rate throttling is what is making this slow.  If you increase the 
maximum bytes per second, it should get much faster.


> Web connector doesn't work 1.5.1 later
> --------------------------------------
>
>                 Key: CONNECTORS-907
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-907
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: Web connector
>    Affects Versions: ManifoldCF 1.5.1
>            Reporter: Shinichiro Abe
>
> Today I set up MCF1.5.1 and configured a job of web connector at example.
> But this didn't work. It fetched only one of seed url.
> I didn't figure out this reason because my manifoldcf.log said nothing.
> Perhaps we need new version 1.5.2.
>  
> This problem occur in trunk, too.
> But *MCF 1.4.1* doesn't occur and can crawl web pages.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to