Bernhard Huber wrote: > > > I have even thought about that the indexing step may act like the > > > profiler. Instead of collecting profile data about how long > > something> takes, update, or create the index information. This > > way the index is > > > kept up-to-date. > > > This way no explicit crawling is necessary for the internal docs. > > > > sorry but I didn't get it. > > > > Let me explain it again: > As your wrote later some timed-triggered task assert that > the index is kept up-to-date. > Now if i want to avoid that, scanning through all documents and > checking if they have changed since last index generation, if > have to have some other mechanism. > One mechanism would be that if a document is requested which > is indexed it checks if it is newer than it index. > Now the implementation of this mechanim would be a la profiler > which alters the SAXConnectors, or Pipeline -- i don't know that > by heart exactly. > Important seems to me: > When and who pays for the indexing, and what is the maximum allowed > time for differing document and index? > > The simple time-triggered indexer is just one solution.
An internal crawler might connect to the cache information and avoid indexing something that is still valid in cache (if it's valid in cache and it's already present in the index, then it's valid in the index) > Another solution is that a serializer, or transformer of a view > writes the index. Hmmm, > The only problem for the transformer/serializer is > to know then to close the IndexWriter if it is creating the index > from scratch. Just updating might work inside a transformer/serializer. > In that case we still might need some time triggered task removing > lucene documents of deleted documents of the site. My SoC alarm started ringing: the sitemap components should have no notion of indexing. The entire crawling/indexing/searching phase happens externally or we'll have concern overlap. But at the same time, it would be nice to have a synchronous way to trigger reindexing of recently modified content (say, a page just edited). This could be done by calling a specific behavior on the 'cocoon' component (which is the engine). Which leads me to think that making crawling, indexing and searching as Avalon components might be FS since we're not going to use any other implementation of these.... What do you think? -- Stefano Mazzocchi One must still have chaos in oneself to be able to give birth to a dancing star. <[EMAIL PROTECTED]> Friedrich Nietzsche -------------------------------------------------------------------- --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]