Hi all, I noticed some confusion lately about the different components and their purpose in Droids.
The following I added last night to the documentation: "Droids (plural) is not designed for a special usecase, it is a framework: Take what you need, do what you want, impossible is nothing. It is the cocoon/UNIX philosophy for automated task processing in java. As a reminder a pipe in unix starts with an invoking component (which produces a stream) and then chain as much other components that interact on the stream that are needed. The modification of each component will be passed to the next component in the chain. For example the following command in a unix box will lance a subversion command to check for the status on the local svn checkout (svn st). The next command will filter the files that are not under svn control (grep ?). The next command will modify the stream to create a command to add this files to the repository (awk ...). The last step will cause the invocation of the command by sending it to the shell (sh). svn st | grep ? | awk '{print "svn add "$2}' | sh In droids your are piping/processing your tasks with small specialist components that combined are resolving your task. Droids offers you following the components so far: * Queue, a queue is the data structure where the different tasks are waiting for service. * Protocol, the protocol interface is a wrapper to hide the underlying implementation of the communication at protocol level. * Parser -> Apache Tika, the parser component is just a wrapper for tika since it offers everything we need. No need to duplicate the effort. The Paser component parses different input types to SAX events. * Handler, a handler is a component that uses the original stream and/or the parse (ContentHandler coming from Tika) and the url to invoke arbitrary business logic on the objects. Unless like the other components different handler can be applied on the stream/parse A Droid (singular) however is all about ONE special usecase. For example the helloCrawler is a wget style crawler. Meaning you go to a page extract the links and save the page afterward to the file system. The focus of the helloCrawler is this special usecase and to solve it hello uses different components. In the future there could evolve different subprojects that are providing specialist components for a special use case. However if components get used in different usecases they should be considered common." In the light of LABS-149 and the move to reuse 100% tika in the parser phase (LABS-118) we will need a new component. A LinkExtractor or with a more generic name TaskExtractor. This extractor component should act on the SAX events the parser produces and return the Outlinks that meeting the filter conditions. The helloCrawler has following flow (as soon tika integration has finished and the extractor component is added): queue -> protocol (opens stream) -> parser (receives stream and transforms it to SAX) -> extractor (since we are crawling we want to extract links from the stream) -> handler (use the original stream and save it to disk) The helloCrawler is a crawler meaning we have a single page to start the queue and while processing extracting new tasks changing the queue. There is as well another typical use case for droids. I will call them "racer" (anagram of crawler). Racer are not trying to extract new tasks they start with a limited number of task that are defined in the initQueue method. I will try to add tonight an example of a file racer since I have a nice use case (I need to clean up the names of various files in a directory - removing special characters and bring them in a special form). Hope that clears up a bit the different components. salu2 -- Thorsten Scherler thorsten.at.apache.org Open Source Java consulting, training and solutions --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]