Hi Rene, Crawling through a proxy is usually easy, but crawling a session-based site is always a challenge.
ISA proxies usually authenticate with NTLM. So you will want to set up your web connection with NTLM authentication in order to even be able to reach the pages. It's not clear that you've got that right yet, because if you don't have it right you will get 401 errors back. Getting this right is a prerequisite; you won't be able to proceed until it is correct. To see that you do, try a very limited crawl that fetches ONLY the login page (or some other un-session-protected content). If you get a 401 you'll need to figure out what's not right before proceeding. It sounds like the site may also be secured using session-based authentication. If a cookie is involved then you need to configure session auth in order to get to any session-protected pages. The trick is that, for session-based auth, you need to fully understand the sequence of pages and forms that happen when a user visits the site and is granted the cookie(s) - the login process, what content URLs are protected, what URLs are part of the login sequence, etc. The end-user documentation describes this in some detail. It can be a challenge to get it all set up right. Finally, for SharePoint sites, if you are intending to index documents, you might well find the SharePoint Connector a better choice than trying to crawl the site with the web connector. Thanks, Karl On Fri, May 11, 2012 at 10:13 AM, Rene Nederhand <[email protected]> wrote: > Hi, > > I am trying to get ManifoldCF crawl our electronic learning > environment (Blackboard). To enable single sign-on, our institution > has placed an ISA server as proxy before Blackboard. > This is giving me a lot of problems. > > I've managed to get passed the ISA server using session based > authentication, but then I am stuck at a 401 error message. According > to our architect, ISA is responsible for the communication with > Blackboard and will set a cookie so Blackboard will know it a > legitimate user is accessing its service. I think, ManifoldCF is not > able to handle this cookie and hence is not able to access Blackboard. > Am I right? If so, is there a possibility to get Blackboard indexed? > > By the way, the same authentication is used for our Sharepoint. I > would like to index this as well.... > > Any help on solving this problem is appreciated. > > Cheers, > > René
