Hi,

I am building a system in which I will have two sources of updates:
1) Bulk updating from the source of truth(db) <- Always inserting 
documents(complete docs)
2) Live updates <- Adding insert and update (complete and incomplete docs)

Also, lets assume that each insert/update has a timestamp, which we belive 
in (not ES timestamp).

The idea is to have a complete, up to date index once the bulk updating 
finishes. To achieve this I need to guarantee that I will have the correct 
data. This would work mostly well, if everything we would do upserts and 
the inserts/updates coming into ES have a strictly increasing timestamp.
But one could imagine that this is a possibly problematic situation, when:

1) We are performing bulk indexing,
  a) we read an object from the db
  b) process it
  c) send it to ES.
2) We have an update on the same object, after step (a) and before if makes 
to ES in the bulk updating - phase(c). That is, ES gets an update with new 
data and only after that we get the insert with the entire document from 
the source of truth with older data. Hence, in ES we have a document with a 
newer timestamp, than the newly added one phase(c).

My theoretical solution: For each operation, have the timestamp for that 
change (timestamp from the system that made the change, not from Elastic 
Search). Lets say that all of the operations that we will perform are 
upserts.
Then once we get an insert or an update (lets call it doc), we have to 
perform the following script (pseudo mvel) inside ES.
{
  if (doc.timestamp > ctx.source.timestamp) {
    // doc is newer than what was in ES
    upsert(doc); // update the index with all of the info from the new doc
  } else {
    // there is already a document in ES with a newer timestamp, note, this 
may be an incomplete document (an update)
    __fill the missing fields in the document in ES with values from doc__
  }
}

My question is:
1) Is there a better approach?
2) If so, is there a simple approach for doing the ' __fill the missing 
fields in the document in ES with values from doc__' operation/script?

Thanks!
Michal Zgliczynski

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f79b9777-133d-4cb3-aa8d-b0e5c9024ba9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to