There was an idea about using Apache Nutch though I’ve never used it before. I’m brainstorming here, but if I can create a little app that asks for credentials and once entered will crawl using Nutch a given website..wondering if that would work.
Thanks, Laura On Mar 20, 2014, at 5:01 PM, Richard Frovarp <[email protected]> wrote: > On 03/20/2014 04:52 PM, Laura McCord wrote: >> Hi, >> >> This might be a shot in the dark but, I was wondering if anyone has any >> experience with web-crawling a website that is ?Casified? but by entering >> your credentials it will proceed to crawl and obtain the content? If so, did >> you use any specific technologies to perform the task? >> >> Thanks, >> Laura >> >> >> > > It kind of depends on what you're after here. Are you looking at letting > Google through, or your own crawler? > > If it's your own, does it even need to be a web crawler? My experience with > search is around Apache Solr. In that case, I'd just get the data directly > out of the database and put it in Solr. Generally you get better search > results if you don't have to mess with those pesky things we call web pages. > > -- > You are currently subscribed to [email protected] as: > [email protected] > To unsubscribe, change settings or access archives, see > http://www.ja-sig.org/wiki/display/JSG/cas-user -- You are currently subscribed to [email protected] as: [email protected] To unsubscribe, change settings or access archives, see http://www.ja-sig.org/wiki/display/JSG/cas-user
