Hello Daan > Daan Hoogenboezem wrote: > The document contains a lot of metadata, only part of it is > relevant to hippo. The metadata that is being extracted by > hippo, think properties like unique identifiers don't change > value over time. > > The problem that we're seeing when executing a dasl query for > one of these properties while the indexer is running is not > that were getting stale data, we're getting no data. It's > almost as if the indexer throws out the old entries > completely, and then recreates them instead of just updating > any values that changed.
It does seem to me that when running a dasl you might have 3 possible situation: 1) the indexer has not started yet: you get results which you would not expect, because you changed the documents already 2) the indexer has started, but not yet finished: the document might be deleted from the index already, but not yet been added again because indexing has not yet finished. 3) the indexing finished, you get the results you expect So, I suppose, if you run the dasl query after a few seconds, you have the correct result again, right? First of all, you must realize that dasls are a search. Searches always do have a delay. Publishing content with our repository and a typical frontend has a dely of 3 sec before cache is invalidated through a event based jms, so takes a few seconds before it is live and searcheable. We run very high traffic sites with this strategy with hardly any delay. It is IMO a very efficient strategy, but you must realize it is different then a sql query on a database. That a document first gets deleted and then appended is search engine specific, but in order to have filesystem efficient access inverted indexes like lucene work like this. There probably are search engines around doing this differently, but might cost a lot of performance due the random filesystem access. So, I am aware that your observed behavior is quite well possible. We update the index in batches for performance reasons. The new repository we are building does not have this limitation and blocks a thread when the indexing queue is not empty and indexes first before the request is executed. I do not say anything about how it should work (for your application). I am also not aware of your project, and about the project management. If this issue is blocking, I think contacting the projectmanager from Hippo might be best, to discuss options. -Ard > > Daan Hoogenboezem > > > ******************************************** > Hippocms-dev: Hippo CMS development public mailinglist > ******************************************** Hippocms-dev: Hippo CMS development public mailinglist
