Ravi, Thanks for the feedback. I was thinking exactly along the lines of what you have said. The only problem that I see is that I plan on processing multiple inserts in one batch job. The inserts and the highly updated object will not be updatable in a single transaction. Thus, there might be situations where an insert was processed but the flag was not set or the row was not deleted. To overcome this issue, I am going to either use make sure that processing inserts multiple times does not effect the output or accept a small percentage of failures.
Ravneet On Jun 15, 5:18 am, Ravi Sharma <[email protected]> wrote: > if A B C and are not dependent on each other and ordering doesnt matter for > you e.g. if you process C A B..then also its fine then you can put a another > column in this insert table. say processed. > When inserting make it N(if string) or false(boolean). > > and query that entity based on this column. > Whenevr you prcoess one row, make the value Y or true. and carry on with > next insert. > > or even you can delete these rows once you have processed them ..then you > will not need to have extra column.... > > Note: I am considering that for one update you will be processing all its > insert in one task or job...no mutliprocessing > > > > > > > > On Wed, Jun 15, 2011 at 4:20 AM, thecheatah <[email protected]> wrote: > > I am trying to implement a system for an object that will be updated a > > lot. The way I was thinking was to turn the updates into inserts then > > have a batch job that executes the inserts in batches to update the > > highly writable object. The inserts can either be sorted by time or by > > some sort of an incremented identifier. This identifier or timestamp > > can be stored on the highly writable object so the next time the job > > runs it knows where to start executing the next batch. > > > Using timestamp I am running into a problem with eventual consistency. > > When I search for inserts to execute some inserts might not make it > > into the query because they were not inserted into the index yet. So > > suppose we have insert A, B and C. If A and C make it into the batch > > job, it will mark all work up to C completed and B will never be > > executed. > > > Using incremented identifiers seems like it will solve the problem but > > implementing such an identifier itself is not clear. To explain why it > > would solve the original problem, we would be able to detect when we > > went from A to C as the difference in the identifiers would be greater > > then 1. The sharded counter is great for counting, but is not good to > > use as a unique identifier given eventual consistency. > > > I can use the memcached increment function but the counter might be > > flushed out of memory at anytime. I believe the memcache update speed > > should be enough for what I want to do. > > > If I had an upper bound time limit on the eventual consistency, I > > could make my system so that it only processes inserts older then the > > time limit. > > > Anyways those are my thoughts and any feedback is appreciated. > > > BTW: The inserts processed in batches are assumed to be not dependent > > on each other. > > > -- > > You received this message because you are subscribed to the Google Groups > > "Google App Engine" group. > > To post to this group, send email to [email protected]. > > To unsubscribe from this group, send email to > > [email protected]. > > For more options, visit this group at > >http://groups.google.com/group/google-appengine?hl=en. -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
