[
https://issues.apache.org/jira/browse/CONNECTORS-907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13915737#comment-13915737
]
Karl Wright commented on CONNECTORS-907:
----------------------------------------
Hi Abe-san,
I started the crawl and got a thread dump. This is what it does while it is
reading the seed:
{code}
"Thread-855" daemon prio=6 tid=0x0599f400 nid=0x93c4 in Object.wait()
[0x0702f000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x2a1173a0> (a
org.apache.manifoldcf.core.throttler.ThrottleBin)
at
org.apache.manifoldcf.core.throttler.ThrottleBin.beginRead(ThrottleBin.java:231)
- locked <0x2a1173a0> (a
org.apache.manifoldcf.core.throttler.ThrottleBin)
at
org.apache.manifoldcf.core.throttler.Throttler$ThrottlingGroup.obtainReadPermission(Throttler.java:807)
at
org.apache.manifoldcf.core.throttler.Throttler$StreamThrottler.obtainReadPermission(Throttler.java:1130)
at
org.apache.manifoldcf.crawler.connectors.webcrawler.ThrottledFetcher$ThrottledInputstream.basicRead(ThrottledFetcher.java:1141)
at
org.apache.manifoldcf.crawler.connectors.webcrawler.ThrottledFetcher$ThrottledInputstream.read(ThrottledFetcher.java:1109)
at
org.apache.manifoldcf.core.common.XThreadInputStream.stuffQueue(XThreadInputStream.java:148)
at
org.apache.manifoldcf.crawler.connectors.webcrawler.ThrottledFetcher$ExecuteMethodThread.run(ThrottledFetcher.java:1477)
{code}
So byte-rate throttling is what is making this slow. If you increase the
maximum bytes per second, it should get much faster.
> Web connector doesn't work 1.5.1 later
> --------------------------------------
>
> Key: CONNECTORS-907
> URL: https://issues.apache.org/jira/browse/CONNECTORS-907
> Project: ManifoldCF
> Issue Type: Bug
> Components: Web connector
> Affects Versions: ManifoldCF 1.5.1
> Reporter: Shinichiro Abe
>
> Today I set up MCF1.5.1 and configured a job of web connector at example.
> But this didn't work. It fetched only one of seed url.
> I didn't figure out this reason because my manifoldcf.log said nothing.
> Perhaps we need new version 1.5.2.
>
> This problem occur in trunk, too.
> But *MCF 1.4.1* doesn't occur and can crawl web pages.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)