On Fri, 7 Dec 2001, Stefano Mazzocchi wrote: > Bernhard Huber wrote: > > > > > I have even thought about that the indexing step may act like the > > > > profiler. Instead of collecting profile data about how long > > > something> takes, update, or create the index information. This > > > way the index is > > > > kept up-to-date. > > > > This way no explicit crawling is necessary for the internal docs. > > > > > > sorry but I didn't get it. > > > > > > > Let me explain it again: > > As your wrote later some timed-triggered task assert that > > the index is kept up-to-date. > > Now if i want to avoid that, scanning through all documents and > > checking if they have changed since last index generation, if > > have to have some other mechanism. > > One mechanism would be that if a document is requested which > > is indexed it checks if it is newer than it index. > > Now the implementation of this mechanim would be a la profiler > > which alters the SAXConnectors, or Pipeline -- i don't know that > > by heart exactly. > > Important seems to me: > > When and who pays for the indexing, and what is the maximum allowed > > time for differing document and index? > > > > The simple time-triggered indexer is just one solution. > > An internal crawler might connect to the cache information and avoid > indexing something that is still valid in cache (if it's valid in cache > and it's already present in the index, then it's valid in the index) > > > Another solution is that a serializer, or transformer of a view > > writes the index. > > Hmmm, > > > The only problem for the transformer/serializer is > > to know then to close the IndexWriter if it is creating the index > > from scratch. Just updating might work inside a transformer/serializer. > > In that case we still might need some time triggered task removing > > lucene documents of deleted documents of the site. > > My SoC alarm started ringing: the sitemap components should have no > notion of indexing. The entire crawling/indexing/searching phase happens > externally or we'll have concern overlap.
A Serializer is indeed the wrong place (SoC) but an IndexingTransformer put before the Serializer in a pipe would keep SoC, don't you think? > But at the same time, it would be nice to have a synchronous way to > trigger reindexing of recently modified content (say, a page just > edited). This could be done by calling a specific behavior on the > 'cocoon' component (which is the engine). Exactly, synchronous indexing is key to have an (almost) always up-to-date index for searching. > Which leads me to think that making crawling, indexing and searching as > Avalon components might be FS since we're not going to use any other > implementation of these.... You don't have to use Avalon for multiple implementations only. There could well be a single implementation for a role. The CM manages its lifecycle a.s.o and every Composer could have access to it. That alone can justify make something a Component IMO. Giacomo --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]