On Mon, 2008-09-15 at 22:35 -0400, Ryan McKinley wrote: > I'm tearing into things, figuring out what I think the API direction > is/should be... I'm having trouble writing a coherent message, so > I'll just send this and see how you feel. In general, I think we need > to make a bigger separation between what is 'core' and what are the > building blocks for specific use cases.
My answer ATM will be short since I need to finish some work. First of all many thanks for this concept/architecture overview. > > CORE > ================================ > Fundamentally, Droids is a framework to keep a bunch of Workers > processing Tasks. > and Droid (robots, being crawler or racer) > "Core" components relate to keeping Workers on Task. > > From the existing API, I think the following are "core" > Queue > Task > Worker > perhaps DelayTimer/Worker > > Core should deal with all the threading issues related to managing the > Tasks. All the ThreadPoolExecutor stuff. Agree. > > Unless I'm missing something, I don't even see why Droid is an > interface -- it appears to be the parent container for management > logic. AbstractDroid introduces some shared logic. Is it just that > makes the manager Runnable? The interface is to do Droid droid = getDroid(name); droid.run(); Every robot needs to implement the interface to invoke it generically. > > The javadoc for Droid run() says: "Invoke an instance of the worker > used in the droid" but the behavior in HelloCrawler is that run() > initializes everything and starts the workers. Is there a reason this > needs to happen in its own Runnable instance? No, if you can move it I would be delighted. Some code still has legacy stuff in it which allowed to hammer a first working prototype but would never win the cleanest coding award. ;) > It seems the 'core' would focus on things like ThreadPoolExecutor. > +1 In addition one part of core should be dedicated for communication between worker, droid and core. That we can create a webinterface that allows you to control different droids and see their current workload and success. > I don't see any need for the existing Core.java class -- is it just > there to make spring configuration easier. This seems like poor > design since it gives access to everything. In my view, each > component should only have access to what it needs. Agree, some methods need to change their visibility and the core a rewrite. It should be dedicated to the above mentioned points. Maybe as well in the light of LABS-144. > > Is the existing Core.java just part of the Cli helper app? In > public void start(String name){ > Droid droid = getDroid(name); > droid.run(); > } > That is one important method of the CLI. > > > COMPONENTS / Blocks? other name? > ================================= > > Each Droid implementation would include the 'Core' plus a set of > components wired together. From the existing API, the things that > strike me as components are: > > Protocol > URL >> InputStream > Parsing > InputStream >> Metadata parsing will produce SAX events and metadata > Handler? Action? > Metadata >> something > (save to solr) > (write to disk) LABS-149 Extractor, consume events and extract Tasks. First example is link extraction since it is crucial part in crawling. > > > DROIDS > ========= > We should deliver a few standard use cases where all the plumbing is > hooked together: > 1. simple web crawler - HelloCrawler > 2. simple filesystem walker - FileRenameRacer > 3. IMAP walker TBN Cheers Ryan. salu2 -- Thorsten Scherler thorsten.at.apache.org Open Source Java consulting, training and solutions --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]