RE: Crawler/Indexer redesign

Vadim Gritsenko Sat, 02 Feb 2002 11:12:20 -0800

> From: Bernhard Huber [mailto:[EMAIL PROTECTED]]
> 
>   hi,
> 
> As I'm not totally happy with the Crawler, Indexer component
interfaces
> I want to address issues here:
> 
> Today CocoonCrawler exposes:
>  void crawl(URL), and Iterator iterator();
> crawl sets the base url, and iterator() delivers one more URL
reachable
> from the base url.
> I have some head-aches using URL objects in the commandline
environment.
> The only simple possibility is to use file: URLs which implicits
storing
> the xml document which has been crawled to the filesystem. But storing
> it to the filesystem I want to avoid for sake of performance.
> 
> Thus I was thinking changing the interface to:
> void crawl(Source) , and Iterator iterator();
> Thus working with Source objects instead of URL objects.


How about 

  Collection crawl(Source)

? Then crawler can be ThreadSafe.


Vadim

 
> The LuceneCocoonIndexer should also change from using URL to using
Source.
> 
> The main reason for this change is implementing crawling and indexing
> today works only using the http: protocol.
> If you want to index xml documents of the local cocoon, or if you want
> to create an index in the command line version of Cocoon, you may not
be
> able to use the http protocol.
> Thus I was thinking about using Source.
> 
> Perhaps someone having a broader, and more detailed understanding of
the
> Cocoon internas could help me a bit.
> 
> bye bernhard


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]

RE: Crawler/Indexer redesign

Reply via email to