I am trying to implement a system for an object that will be updated a lot. The way I was thinking was to turn the updates into inserts then have a batch job that executes the inserts in batches to update the highly writable object. The inserts can either be sorted by time or by some sort of an incremented identifier. This identifier or timestamp can be stored on the highly writable object so the next time the job runs it knows where to start executing the next batch.
Using timestamp I am running into a problem with eventual consistency. When I search for inserts to execute some inserts might not make it into the query because they were not inserted into the index yet. So suppose we have insert A, B and C. If A and C make it into the batch job, it will mark all work up to C completed and B will never be executed. Using incremented identifiers seems like it will solve the problem but implementing such an identifier itself is not clear. To explain why it would solve the original problem, we would be able to detect when we went from A to C as the difference in the identifiers would be greater then 1. The sharded counter is great for counting, but is not good to use as a unique identifier given eventual consistency. I can use the memcached increment function but the counter might be flushed out of memory at anytime. I believe the memcache update speed should be enough for what I want to do. If I had an upper bound time limit on the eventual consistency, I could make my system so that it only processes inserts older then the time limit. Anyways those are my thoughts and any feedback is appreciated. BTW: The inserts processed in batches are assumed to be not dependent on each other. -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
