[ 
https://issues.apache.org/jira/browse/CONNECTORS-489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409076#comment-13409076
 ] 

Karl Wright commented on CONNECTORS-489:
----------------------------------------

Mail from Rene:

We are now able to connect to the IIS proxy, thanks to the added logging 
facilities by Karl, we were able to see that this is the fix :

{code}
Index: 
connectors/webcrawler/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/webcrawler/WebcrawlerConnector.java
===================================================================
--- 
connectors/webcrawler/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/webcrawler/WebcrawlerConnector.java
  (revision 1357379)
+++ 
connectors/webcrawler/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/webcrawler/WebcrawlerConnector.java
  (working copy)
@@ -361,7 +361,7 @@
       String emailAddress = 
params.getParameter(WebcrawlerConfig.PARAMETER_EMAIL);
       if (emailAddress == null)
         throw new ManifoldCFException("Missing email address");
-      userAgent = "ApacheManifoldCFWebCrawler; "+emailAddress+")";
+      userAgent = "Mozilla/5.0 (ApacheManifoldCFWebCrawler; "+emailAddress+")";
       from = emailAddress;
 
       x = params.getParameter(WebcrawlerConfig.PARAMETER_ROBOTSUSAGE);
{code}

Yes, this is weird, a proxy shouldn't fail on User-Agent settings, but 
apparently this one does.
Even Google apparently does this : 
http://www.useragentstring.com/pages/Googlebot/

                
> Some proxies restrict access based on User-Agent header
> -------------------------------------------------------
>
>                 Key: CONNECTORS-489
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-489
>             Project: ManifoldCF
>          Issue Type: Improvement
>          Components: RSS connector, Web connector
>    Affects Versions: ManifoldCF 0.6
>            Reporter: Karl Wright
>            Assignee: Karl Wright
>             Fix For: ManifoldCF 0.6
>
>
> Some ISA proxies restrict access to content based on User-Agent.  We need to 
> have a user-agent header that doesn't fail on these sites.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to