Have you looked at using versioning?
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-index_.html#index-versioning
cheers,
Rob
On Thursday, May 1, 2014 2:47:39 PM UTC-7, Michał Zgliczyński wrote:
>
> Hi,
>
> I am building a system in which I will have two sources of updates:
> 1) Bulk updating from the source of truth(db) <- Always inserting
> documents(complete docs)
> 2) Live updates <- Adding insert and update (complete and incomplete docs)
>
> Also, lets assume that each insert/update has a timestamp, which we belive
> in (not ES timestamp).
>
> The idea is to have a complete, up to date index once the bulk updating
> finishes. To achieve this I need to guarantee that I will have the correct
> data. This would work mostly well, if everything we would do upserts and
> the inserts/updates coming into ES have a strictly increasing timestamp.
> But one could imagine that this is a possibly problematic situation, when:
>
> 1) We are performing bulk indexing,
> a) we read an object from the db
> b) process it
> c) send it to ES.
> 2) We have an update on the same object, after step (a) and before if
> makes to ES in the bulk updating - phase(c). That is, ES gets an update
> with new data and only after that we get the insert with the entire
> document from the source of truth with older data. Hence, in ES we have a
> document with a newer timestamp, than the newly added one phase(c).
>
> My theoretical solution: For each operation, have the timestamp for that
> change (timestamp from the system that made the change, not from Elastic
> Search). Lets say that all of the operations that we will perform are
> upserts.
> Then once we get an insert or an update (lets call it doc), we have to
> perform the following script (pseudo mvel) inside ES.
> {
> if (doc.timestamp > ctx.source.timestamp) {
> // doc is newer than what was in ES
> upsert(doc); // update the index with all of the info from the new doc
> } else {
> // there is already a document in ES with a newer timestamp, note,
> this may be an incomplete document (an update)
> __fill the missing fields in the document in ES with values from doc__
> }
> }
>
> My question is:
> 1) Is there a better approach?
> 2) If so, is there a simple approach for doing the ' __fill the missing
> fields in the document in ES with values from doc__' operation/script?
>
> Thanks!
> Michal Zgliczynski
>
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b988d556-fd38-4f85-9214-2d471558b778%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.