I missed that the later doc would only be partial. What is the reason to 
use the partial doc? That really complicates things.

Filling in missing fields is going to be a very large headache. You'll 
probably kill performance trying to do it too. Likely it'll be so complex 
it will present a lot more trouble.

I think if you can better present the overall use cases you will get better 
insight into how to work this out.


On Thursday, May 1, 2014 4:51:03 PM UTC-7, Michał Zgliczyński wrote:
>
> Hi,
> Thank you for your response. I have looked through this blog post: 
> http://www.elasticsearch.org/blog/elasticsearch-versioning-support/
> It looks as if external versioning would be the way to go. Have the 
> timestamps act as version numbers and let ES only pick the document with 
> the newest version as the correct document. However, with the situation I 
> have presented above, ES will fail. A quote from the post:
> "With version_type set to external, Elasticsearch will store the version 
> number as given and will not increment it. Also, instead of checking for an 
> exact match, Elasticsearch will only return a version collision error if 
> the version currently stored is greater or equal to the one in the indexing 
> command. This effectively means “only store this information if no one else 
> has supplied the same or a more recent version in the meantime”. 
> Concretely, the above request will succeed if the stored version number is 
> smaller than 526. 526 and above will cause the request to fail."
>
> In my example, we would have that situation. A partial doc with a larger 
> version number(later timestamp) is already stored in ES and we get the 
> complete document with a smaller timestamp. In this situation we would like 
> to merge these 2 documents in a way that, we have all of the fields from 
> the partial doc and the other fields(not currently specified in the ES 
> document) to be filled from the complete document.
>
> Thanks!
> Michal Zgliczynski
>
> W dniu czwartek, 1 maja 2014 14:58:31 UTC-7 użytkownik Rob Ottaway napisał:
>>
>> Have you looked at using versioning?
>>
>>
>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-index_.html#index-versioning
>>
>> cheers,
>> Rob
>>
>> On Thursday, May 1, 2014 2:47:39 PM UTC-7, Michał Zgliczyński wrote:
>>>
>>> Hi,
>>>
>>> I am building a system in which I will have two sources of updates:
>>> 1) Bulk updating from the source of truth(db) <- Always inserting 
>>> documents(complete docs)
>>> 2) Live updates <- Adding insert and update (complete and incomplete 
>>> docs)
>>>
>>> Also, lets assume that each insert/update has a timestamp, which we 
>>> belive in (not ES timestamp).
>>>
>>> The idea is to have a complete, up to date index once the bulk updating 
>>> finishes. To achieve this I need to guarantee that I will have the correct 
>>> data. This would work mostly well, if everything we would do upserts and 
>>> the inserts/updates coming into ES have a strictly increasing timestamp.
>>> But one could imagine that this is a possibly problematic situation, 
>>> when:
>>>
>>> 1) We are performing bulk indexing,
>>>   a) we read an object from the db
>>>   b) process it
>>>   c) send it to ES.
>>> 2) We have an update on the same object, after step (a) and before if 
>>> makes to ES in the bulk updating - phase(c). That is, ES gets an update 
>>> with new data and only after that we get the insert with the entire 
>>> document from the source of truth with older data. Hence, in ES we have a 
>>> document with a newer timestamp, than the newly added one phase(c).
>>>
>>> My theoretical solution: For each operation, have the timestamp for that 
>>> change (timestamp from the system that made the change, not from Elastic 
>>> Search). Lets say that all of the operations that we will perform are 
>>> upserts.
>>> Then once we get an insert or an update (lets call it doc), we have to 
>>> perform the following script (pseudo mvel) inside ES.
>>> {
>>>   if (doc.timestamp > ctx.source.timestamp) {
>>>     // doc is newer than what was in ES
>>>     upsert(doc); // update the index with all of the info from the new 
>>> doc
>>>   } else {
>>>     // there is already a document in ES with a newer timestamp, note, 
>>> this may be an incomplete document (an update)
>>>     __fill the missing fields in the document in ES with values from 
>>> doc__
>>>   }
>>> }
>>>
>>> My question is:
>>> 1) Is there a better approach?
>>> 2) If so, is there a simple approach for doing the ' __fill the missing 
>>> fields in the document in ES with values from doc__' operation/script?
>>>
>>> Thanks!
>>> Michal Zgliczynski
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8a254b71-71b1-4dbe-8df1-0396fc2773bd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to