Shinichiro Abe created CONNECTORS-854:
-----------------------------------------

             Summary: Enable STALE_CONNECTION_CHECK
                 Key: CONNECTORS-854
                 URL: https://issues.apache.org/jira/browse/CONNECTORS-854
             Project: ManifoldCF
          Issue Type: Improvement
          Components: Web connector
    Affects Versions: ManifoldCF 1.4.1
            Reporter: Shinichiro Abe
            Priority: Minor
             Fix For: ManifoldCF 1.5


When crawling some sites( < 1000 docs), sometimes manifoldcf.log shows the 
following "The target server failed to respond" messages. It seems that 
NoHttpResponseException is thrown at ThrottledFetcher.

{noformat}
 WARN 2014-01-09 12:39:16,701 (Worker thread '10') - Pre-ingest service 
interruption reported for job 1389238470356 connection '1': Timed out waiting 
for response for 'http://www.rondhuit.com/?p=1890': The target server failed to 
respond
 WARN 2014-01-09 12:39:55,509 (Worker thread '7') - Pre-ingest service 
interruption reported for job 1389238470356 connection '1': Timed out waiting 
for response for 'http://www.rondhuit.com/?p=675': The target server failed to 
respond
{noformat}

The fetching that page after retry time(15 minutes) passed was running 
successfully.

I tried to change a httpclient configuration then I confirmed that massage was 
not shown.

{noformat}
+++ 
connectors/webcrawler/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/webcrawler/ThrottledFetcher.java
@@ -463,7 +463,7 @@
         BasicHttpParams params = new BasicHttpParams();
         params.setParameter(ClientPNames.DEFAULT_HOST,fetchHost);
         params.setBooleanParameter(CoreConnectionPNames.TCP_NODELAY,true);
-        
params.setBooleanParameter(CoreConnectionPNames.STALE_CONNECTION_CHECK,false);
+        
params.setBooleanParameter(CoreConnectionPNames.STALE_CONNECTION_CHECK,true);
         params.setBooleanParameter(ClientPNames.ALLOW_CIRCULAR_REDIRECTS,true);
{noformat}

I know two users who are hitting this issue and have resolved it by turning on 
stale connection check.
The crawling job is done more quickly than the check is false because there are 
not retry fetches.

May I switch false to true in stale connection check as well as SolrConnector's 
httpclient configuration?




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to