There was an idea about using Apache Nutch though I’ve never used it before. 
I’m brainstorming here, but if I can create a little app that asks for 
credentials and once entered will crawl using Nutch a given website..wondering 
if that would work.

Thanks,
Laura


On Mar 20, 2014, at 5:01 PM, Richard Frovarp <[email protected]> wrote:

> On 03/20/2014 04:52 PM, Laura McCord wrote:
>> Hi,
>> 
>> This might be a shot in the dark but, I was wondering if anyone has any 
>> experience with web-crawling a website that is ?Casified? but by entering 
>> your credentials it will proceed to crawl and obtain the content? If so, did 
>> you use any specific technologies to perform the task?
>> 
>> Thanks,
>>  Laura
>> 
>> 
>> 
> 
> It kind of depends on what you're after here. Are you looking at letting 
> Google through, or your own crawler?
> 
> If it's your own, does it even need to be a web crawler? My experience with 
> search is around Apache Solr. In that case, I'd just get the data directly 
> out of the database and put it in Solr. Generally you get better search 
> results if you don't have to mess with those pesky things we call web pages.
> 
> -- 
> You are currently subscribed to [email protected] as: 
> [email protected]
> To unsubscribe, change settings or access archives, see 
> http://www.ja-sig.org/wiki/display/JSG/cas-user


-- 
You are currently subscribed to [email protected] as: 
[email protected]
To unsubscribe, change settings or access archives, see 
http://www.ja-sig.org/wiki/display/JSG/cas-user

Reply via email to