You'll want to authenticate pre-crawl and use a thread-safe cookie manager in your HTTP library. CAS provides a "Ticket Granting Ticket" (TGT) cookie that stands in place of user credentials. In most cases, that should work. You may also want to enable following redirects if that's an option in your HTTP library.
-- Build Smarter Software. Cedric Hurst, Principal Spantree Technology Group, LLC 1144 W Fulton Market, Suite 120, Chicago, IL email: [email protected] (mailto:[email protected]) | phone: 888.386.5501 web: http://www.spantree.net On Thursday, March 20, 2014 at 4:52 PM, Laura McCord wrote: > Hi, > > This might be a shot in the dark but, I was wondering if anyone has any > experience with web-crawling a website that is “Casified” but by entering > your credentials it will proceed to crawl and obtain the content? If so, did > you use any specific technologies to perform the task? > > Thanks, > Laura > > > > -- > You are currently subscribed to [email protected] > (mailto:[email protected]) as: [email protected] > (mailto:[email protected]) > To unsubscribe, change settings or access archives, see > http://www.ja-sig.org/wiki/display/JSG/cas-user > > -- You are currently subscribed to [email protected] as: [email protected] To unsubscribe, change settings or access archives, see http://www.ja-sig.org/wiki/display/JSG/cas-user
