On Mon, 2008-09-15 at 22:35 -0400, Ryan McKinley wrote:
> I'm tearing into things, figuring out what I think the API direction  
> is/should be...  I'm having trouble writing a coherent message, so  
> I'll just send this and see how you feel.  In general, I think we need  
> to make a bigger separation between what is 'core' and what are the  
> building blocks for specific use cases.

My answer ATM will be short since I need to finish some work.

First of all many thanks for this concept/architecture overview.

> 
> CORE
> ================================
> Fundamentally, Droids is a framework to keep a bunch of Workers  
> processing Tasks.
> 

and Droid (robots, being crawler or racer)

> "Core" components relate to keeping Workers on Task.
> 
>  From the existing API, I think the following are "core"
>    Queue
>    Task
>    Worker
>    perhaps DelayTimer/Worker
> 
> Core should deal with all the threading issues related to managing the  
> Tasks.  All the ThreadPoolExecutor stuff.


Agree.

> 
> Unless I'm missing something, I don't even see why Droid is an  
> interface -- it appears to be the parent container for management  
> logic.  AbstractDroid introduces some shared logic.  Is it just that  
> makes the manager Runnable?

The interface is to do 
Droid droid = getDroid(name);
droid.run();

Every robot needs to implement the interface to invoke it generically. 

> 
> The javadoc for Droid run() says: "Invoke an instance of the worker  
> used in the droid" but the behavior in HelloCrawler is that run()  
> initializes everything and starts the workers.  Is there a reason this  
> needs to happen in its own Runnable instance?

No, if you can move it I would be delighted. Some code still has legacy
stuff in it which allowed to hammer a first working prototype but would
never win the cleanest coding award. ;)

> It seems the 'core' would focus on things like ThreadPoolExecutor.
> 

+1

In addition one part of core should be dedicated for communication
between worker, droid and core. That we can create a webinterface that
allows you to control different droids and see their current workload
and success.

> I don't see any need for the existing Core.java class -- is it just  
> there to make spring configuration easier.  This seems like poor  
> design since it gives access to everything.  In my view, each  
> component should only have access to what it needs.

Agree, some methods need to change their visibility and the core a
rewrite. It should be dedicated to the above mentioned points.

Maybe as well in the light of LABS-144.

> 
> Is the existing Core.java just part of the Cli helper app?  In
>    public void start(String name){
>      Droid droid = getDroid(name);
>      droid.run();
>    }
> 

That is one important method of the CLI. 

> 
> 
> COMPONENTS / Blocks?  other name?
> =================================
> 
> Each Droid implementation would include the 'Core' plus a set of  
> components wired together.  From the existing API, the things that  
> strike me as components are:
> 
> Protocol
>    URL >> InputStream
> Parsing
>    InputStream >> Metadata

parsing will produce SAX events and metadata

> Handler?  Action?
>    Metadata >> something
>    (save to solr)
>    (write to disk)

LABS-149

Extractor, consume events and extract Tasks.

First example is link extraction since it is crucial part in crawling.
 


> 
> 
> DROIDS
> =========
> We should deliver a few standard use cases where all the plumbing is  
> hooked together:
> 1. simple web crawler

- HelloCrawler

> 2. simple filesystem walker

- FileRenameRacer

> 3. IMAP walker

TBN

Cheers Ryan.

salu2
-- 
Thorsten Scherler                                 thorsten.at.apache.org
Open Source Java                      consulting, training and solutions


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to