I'm tearing into things, figuring out what I think the API direction
is/should be... I'm having trouble writing a coherent message, so
I'll just send this and see how you feel. In general, I think we need
to make a bigger separation between what is 'core' and what are the
building blocks for specific use cases.
CORE
================================
Fundamentally, Droids is a framework to keep a bunch of Workers
processing Tasks.
"Core" components relate to keeping Workers on Task.
From the existing API, I think the following are "core"
Queue
Task
Worker
perhaps DelayTimer/Worker
Core should deal with all the threading issues related to managing the
Tasks. All the ThreadPoolExecutor stuff.
Unless I'm missing something, I don't even see why Droid is an
interface -- it appears to be the parent container for management
logic. AbstractDroid introduces some shared logic. Is it just that
makes the manager Runnable?
The javadoc for Droid run() says: "Invoke an instance of the worker
used in the droid" but the behavior in HelloCrawler is that run()
initializes everything and starts the workers. Is there a reason this
needs to happen in its own Runnable instance?
It seems the 'core' would focus on things like ThreadPoolExecutor.
I don't see any need for the existing Core.java class -- is it just
there to make spring configuration easier. This seems like poor
design since it gives access to everything. In my view, each
component should only have access to what it needs.
Is the existing Core.java just part of the Cli helper app? In
public void start(String name){
Droid droid = getDroid(name);
droid.run();
}
COMPONENTS / Blocks? other name?
=================================
Each Droid implementation would include the 'Core' plus a set of
components wired together. From the existing API, the things that
strike me as components are:
Protocol
URL >> InputStream
Parsing
InputStream >> Metadata
Handler? Action?
Metadata >> something
(save to solr)
(write to disk)
DROIDS
=========
We should deliver a few standard use cases where all the plumbing is
hooked together:
1. simple web crawler
2. simple filesystem walker
3. IMAP walker