Hi, Thank you for your response. I have looked through this blog post: http://www.elasticsearch.org/blog/elasticsearch-versioning-support/ It looks as if external versioning would be the way to go. Have the timestamps act as version numbers and let ES only pick the document with the newest version as the correct document. However, with the situation I have presented above, ES will fail. A quote from the post: "With version_type set to external, Elasticsearch will store the version number as given and will not increment it. Also, instead of checking for an exact match, Elasticsearch will only return a version collision error if the version currently stored is greater or equal to the one in the indexing command. This effectively means “only store this information if no one else has supplied the same or a more recent version in the meantime”. Concretely, the above request will succeed if the stored version number is smaller than 526. 526 and above will cause the request to fail."
In my example, we would have that situation. A partial doc with a larger version number(later timestamp) is already stored in ES and we get the complete document with a smaller timestamp. In this situation we would like to merge these 2 documents in a way that, we have all of the fields from the partial doc and the other fields(not currently specified in the ES document) to be filled from the complete document. Thanks! Michal Zgliczynski W dniu czwartek, 1 maja 2014 14:58:31 UTC-7 użytkownik Rob Ottaway napisał: > > Have you looked at using versioning? > > > http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-index_.html#index-versioning > > cheers, > Rob > > On Thursday, May 1, 2014 2:47:39 PM UTC-7, Michał Zgliczyński wrote: >> >> Hi, >> >> I am building a system in which I will have two sources of updates: >> 1) Bulk updating from the source of truth(db) <- Always inserting >> documents(complete docs) >> 2) Live updates <- Adding insert and update (complete and incomplete docs) >> >> Also, lets assume that each insert/update has a timestamp, which we >> belive in (not ES timestamp). >> >> The idea is to have a complete, up to date index once the bulk updating >> finishes. To achieve this I need to guarantee that I will have the correct >> data. This would work mostly well, if everything we would do upserts and >> the inserts/updates coming into ES have a strictly increasing timestamp. >> But one could imagine that this is a possibly problematic situation, when: >> >> 1) We are performing bulk indexing, >> a) we read an object from the db >> b) process it >> c) send it to ES. >> 2) We have an update on the same object, after step (a) and before if >> makes to ES in the bulk updating - phase(c). That is, ES gets an update >> with new data and only after that we get the insert with the entire >> document from the source of truth with older data. Hence, in ES we have a >> document with a newer timestamp, than the newly added one phase(c). >> >> My theoretical solution: For each operation, have the timestamp for that >> change (timestamp from the system that made the change, not from Elastic >> Search). Lets say that all of the operations that we will perform are >> upserts. >> Then once we get an insert or an update (lets call it doc), we have to >> perform the following script (pseudo mvel) inside ES. >> { >> if (doc.timestamp > ctx.source.timestamp) { >> // doc is newer than what was in ES >> upsert(doc); // update the index with all of the info from the new doc >> } else { >> // there is already a document in ES with a newer timestamp, note, >> this may be an incomplete document (an update) >> __fill the missing fields in the document in ES with values from doc__ >> } >> } >> >> My question is: >> 1) Is there a better approach? >> 2) If so, is there a simple approach for doing the ' __fill the missing >> fields in the document in ES with values from doc__' operation/script? >> >> Thanks! >> Michal Zgliczynski >> > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/550bbb9d-b320-41a8-82d7-5c663c1f7e71%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
