Hello Daan 

> Daan Hoogenboezem wrote:
> The document contains a lot of metadata, only part of it is 
> relevant to hippo. The metadata that is being extracted by 
> hippo, think properties like unique identifiers don't change 
> value over time.
> 
> The problem that we're seeing when executing a dasl query for 
> one of these properties while the indexer is running is not 
> that were getting stale data, we're getting no data. It's 
> almost as if the indexer throws out the old entries 
> completely, and then recreates them instead of just updating 
> any values that changed.

It does seem to me that when running a dasl you might have 3 possible
situation:

1) the indexer has not started yet: you get results which you would not
expect, because you changed the documents already
2) the indexer has started, but not yet finished: the document might be
deleted from the index already, but not yet been added again because
indexing has not yet finished. 
3) the indexing finished, you get the results you expect

So, I suppose, if you run the dasl query after a few seconds, you have
the correct result again, right? 

First of all, you must realize that dasls are a search. Searches always
do have a delay. Publishing content with our repository and a typical
frontend has a dely of 3 sec before cache is invalidated through a event
based jms, so takes a few seconds before it is live and searcheable. We
run very high traffic sites with this strategy with hardly any delay. It
is IMO a very efficient strategy, but you must realize it is different
then a sql query on a database.

That a document first gets deleted and then appended is search engine
specific, but in order to have filesystem efficient access inverted
indexes like lucene work like this. There probably are search engines
around doing this differently, but might cost a lot of performance due
the random filesystem access. 

So, I am aware that your observed behavior is quite well possible. We
update the index in batches for performance reasons. The new repository
we are building does not have this limitation and blocks a thread when
the indexing queue is not empty and indexes first before the request is
executed. I do not say anything about how it should work (for your
application). I am also not aware of your project, and about the project
management. If this issue is blocking, I think contacting the
projectmanager from Hippo might be best, to discuss options. 

-Ard

> 
> Daan Hoogenboezem
> 
> 
> ********************************************
> Hippocms-dev: Hippo CMS development public mailinglist
> 
********************************************
Hippocms-dev: Hippo CMS development public mailinglist

Reply via email to