Hi,

> I think it's redundant to hardcode the indexing logic into all crawler component 
>(ftp, http, jdbc, filesys crawler). It's an interesting question how the components 
>can communicate? (don't you think using avalon is a good way?)

I've just had a look at avalon, and it looks promising.

As i've written before, i am thinking of three different component types: sources, 
transformators and indexer(Lucene). I thought a little bit about a flexible way for 
configuration of the indexing procedure and it seems that there could be many many 
ways for combining sources, transformers and Lucene. What do you think about
using a blackboard design pattern: Sources are producing records into a central 
repostitory. Transformator are registering for records with a  special signature and 
are getting these records for transformation. Finally, if nobody wants to transform a 
record anymore, it is delivered to lucene.

btw: it would be nice, if indexing could be in sync with the indexed data. If files 
were deleted, the index entries should also been deleted.

regards,

manfred



--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Reply via email to