Re: Some simple questions about Droids

Ryan McKinley Tue, 31 Mar 2009 12:20:58 -0700


On Mar 31, 2009, at 12:38 PM, Robin Howlett wrote:

Hello,
I've only really taken an introductory look at Droids and ranthrough thesamples. I think I'll be using Droids for an upcoming project. Ihave a
couple of questions first:
I ran both the SimpleRuntime example and the Cli example through asite Iwish to parse. Droids seems to keep an index of the links in thepage toparse and those parsed already - where is that list? In memory? Isit the
queue? How big can that queue grow to?

the Simple Queue included in Droids is just an in memoryConcurrentHashMap.

The site I will be crawling will be around 500,000 pages - is this anumberthat could be supported? Can the index be persisted using a DBinstead of
being stored in memory?


Yes, the interface is easy to implement with a DB backend:
http://svn.apache.org/repos/asf/incubator/droids/trunk/droids-core/src/main/java/org/apache/droids/api/TaskQueue.java

When I use droids, this is what I use -- it has become too domainspecific for me to give back anything too useful now. We should lookinto adding something into the core that persists to something -- SQL,ehcache, whatever.

Some of the links to content I wish to crawl/parse/index areJavaScript popups - therefore I wish to alter the url for the crawler to use; thisshould
be no problem right?

Should not be a problem -- if you can find the URLs from the parsedata you can add them to the Queue


ryan

Re: Some simple questions about Droids

Reply via email to