> From: Bernhard Huber [mailto:[EMAIL PROTECTED]] > > hi, > > As I'm not totally happy with the Crawler, Indexer component interfaces > I want to address issues here: > > Today CocoonCrawler exposes: > void crawl(URL), and Iterator iterator(); > crawl sets the base url, and iterator() delivers one more URL reachable > from the base url. > I have some head-aches using URL objects in the commandline environment. > The only simple possibility is to use file: URLs which implicits storing > the xml document which has been crawled to the filesystem. But storing > it to the filesystem I want to avoid for sake of performance. > > Thus I was thinking changing the interface to: > void crawl(Source) , and Iterator iterator(); > Thus working with Source objects instead of URL objects.
How about Collection crawl(Source) ? Then crawler can be ThreadSafe. Vadim > The LuceneCocoonIndexer should also change from using URL to using Source. > > The main reason for this change is implementing crawling and indexing > today works only using the http: protocol. > If you want to index xml documents of the local cocoon, or if you want > to create an index in the command line version of Cocoon, you may not be > able to use the http protocol. > Thus I was thinking about using Source. > > Perhaps someone having a broader, and more detailed understanding of the > Cocoon internas could help me a bit. > > bye bernhard --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]