Additionally, how hard would it be to add Crawlers for things like:
1. IMAP and other mail stores (even things like PST files, etc.)
2. Somewhat strange: Databases. Just point it at a DB and have it
suck in tables/rows/columns
3. Things like web APIs (Flickr, del.icio.us, etc),
Any comments on fault tolerance and incremental crawling would also be
appreciated. Is there anything in the current design that you think
prevents these things?
Thanks,
Grant
On Aug 27, 2008, at 5:26 PM, Grant Ingersoll wrote:
Is there a feature list for Droids anywhere?
Or, can it do:
1. Honor robots.txt
2. Crawl throttling
3. Distributed crawling (i.e. give a bunch of links to it and some
distributed compute resources and have it go to town)
Thanks,
Grant
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]