Hi,
Thank you for your response. I have looked through this blog 
post: http://www.elasticsearch.org/blog/elasticsearch-versioning-support/
It looks as if external versioning would be the way to go. Have the 
timestamps act as version numbers and let ES only pick the document with 
the newest version as the correct document. However, with the situation I 
have presented above, ES will fail. A quote from the post:
"With version_type set to external, Elasticsearch will store the version 
number as given and will not increment it. Also, instead of checking for an 
exact match, Elasticsearch will only return a version collision error if 
the version currently stored is greater or equal to the one in the indexing 
command. This effectively means “only store this information if no one else 
has supplied the same or a more recent version in the meantime”. 
Concretely, the above request will succeed if the stored version number is 
smaller than 526. 526 and above will cause the request to fail."

In my example, we would have that situation. A partial doc with a larger 
version number(later timestamp) is already stored in ES and we get the 
complete document with a smaller timestamp. In this situation we would like 
to merge these 2 documents in a way that, we have all of the fields from 
the partial doc and the other fields(not currently specified in the ES 
document) to be filled from the complete document.

Thanks!
Michal Zgliczynski

W dniu czwartek, 1 maja 2014 14:58:31 UTC-7 użytkownik Rob Ottaway napisał:
>
> Have you looked at using versioning?
>
>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-index_.html#index-versioning
>
> cheers,
> Rob
>
> On Thursday, May 1, 2014 2:47:39 PM UTC-7, Michał Zgliczyński wrote:
>>
>> Hi,
>>
>> I am building a system in which I will have two sources of updates:
>> 1) Bulk updating from the source of truth(db) <- Always inserting 
>> documents(complete docs)
>> 2) Live updates <- Adding insert and update (complete and incomplete docs)
>>
>> Also, lets assume that each insert/update has a timestamp, which we 
>> belive in (not ES timestamp).
>>
>> The idea is to have a complete, up to date index once the bulk updating 
>> finishes. To achieve this I need to guarantee that I will have the correct 
>> data. This would work mostly well, if everything we would do upserts and 
>> the inserts/updates coming into ES have a strictly increasing timestamp.
>> But one could imagine that this is a possibly problematic situation, when:
>>
>> 1) We are performing bulk indexing,
>>   a) we read an object from the db
>>   b) process it
>>   c) send it to ES.
>> 2) We have an update on the same object, after step (a) and before if 
>> makes to ES in the bulk updating - phase(c). That is, ES gets an update 
>> with new data and only after that we get the insert with the entire 
>> document from the source of truth with older data. Hence, in ES we have a 
>> document with a newer timestamp, than the newly added one phase(c).
>>
>> My theoretical solution: For each operation, have the timestamp for that 
>> change (timestamp from the system that made the change, not from Elastic 
>> Search). Lets say that all of the operations that we will perform are 
>> upserts.
>> Then once we get an insert or an update (lets call it doc), we have to 
>> perform the following script (pseudo mvel) inside ES.
>> {
>>   if (doc.timestamp > ctx.source.timestamp) {
>>     // doc is newer than what was in ES
>>     upsert(doc); // update the index with all of the info from the new doc
>>   } else {
>>     // there is already a document in ES with a newer timestamp, note, 
>> this may be an incomplete document (an update)
>>     __fill the missing fields in the document in ES with values from doc__
>>   }
>> }
>>
>> My question is:
>> 1) Is there a better approach?
>> 2) If so, is there a simple approach for doing the ' __fill the missing 
>> fields in the document in ES with values from doc__' operation/script?
>>
>> Thanks!
>> Michal Zgliczynski
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/550bbb9d-b320-41a8-82d7-5c663c1f7e71%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to