Hi Karl, I need to crawl sequence of (different) URLs from the same host, and each URL defines next one to be crawled; I can crawl next URL only after specified amount of time. URLs are different... of course I can use Thread.currentThread.sleep() before calling activities.addDocumentReference(newUrl) but it seems too naïve... And this use case is much similar to generic Web crawl (when we need to be polite, 2-3 seconds delay before recrawl from same domain)
-----Original Message----- From: Karl Wright [mailto:daddy...@gmail.com] Sent: April-05-11 11:06 AM To: connectors-user@incubator.apache.org Subject: Re: How to add tast to queue dynamically (WebCrawler) If you are trying to control the schedule for the FIRST time a document is fetched, the IProcessActivity API doesn't permit that at this time. You would need to add a new version of addDocumentReference() to the IProcessActivity interface, which allowed you to set the scheduled processing time in addition to everything else. The internals for such a change should be straightforward since all the moving parts are already there. I'm curious, however, about your use case. It is currently unheard of for connectors to try to control the scheduling of all documents being fetched - this would interfere with ManifoldCF's scheduling algorithms, which are designed for maximum throughput. I'd like to be sure your design makes sense before I agree that this is a reasonable addition to the API. Can you explain the connector and its design so that I can see what you are trying to accomplish? Thanks! Karl On Tue, Apr 5, 2011 at 10:51 AM, Fuad Efendi <f...@efendi.ca> wrote: > > Hi Karl, > > So this is "retry"... can we schedule document retrieval? I retrieve > XML, generate new URL, and I want to schedule this new Document to be > retrieved at specific time -Fuad > >