This is actually a pretty good implementation. The only issue is the size of the processed task list. Instead of having two tasks, I am thinking that the one task will clean up the processed task list before it begins its work. Basically check that the processed inserts have indeed been deleted.
So the processed list records all the inserts processed in the previous run. It first deletes all those inserts if needed, then it goes on to process new tasks. Thanks, Ravneet On Jun 15, 11:48 am, Ravi Sharma <[email protected]> wrote: > In those scenerio you can go ahead and do something extra... > > Keep a list of Keys in your highly updating object, and whenevr you process > one insert and update it into main updating object make sure you put the key > in this object's list property.,SO your main object will know if i have got > the content of this insert or not > > Say after it when you are deleting or updating insert object .. > then when next time you get the same insert(as it was faile when you were > marking it as processed), check if key exists in list.... if yes then mark > the insert object processed and also remove it from list property. > > Also then you need to have a another job which will clean the list property > from updating object. read the object list..get the insert object for each > key, if they are marked as processed then remove it from this list. > > this will eventually increase your datatsore put but you will not have to > worry about some inconsistency. > > So you code will look like this > Highly updating object will have property liek this > List<Key> processedInserts; (in Java JDO) > > TASK -1 > 1) getNextInsert object say i1, assume its key is k1 > //at this atge say processedInserts is empty > 2) check if k1 exists in processedInserts, if no then go to step 3 else go > to 4 > 3) update Highly updating object with content of insert object i1, also add > the k1 into processedInserts > //at this stage it will have k1 in processedInserts > 4) Update i1 as processed. > > Now after this we will have a growing list of processedInserts > property...and it has upper boud. So to keep it down. you need to have > another job running once in a while or submit a task depending on step2, if > processedInserts.size > some number say 500. > TASK -2 > In this task > 1) getHighlyUpdatingObject > 2) Loop through processedInserts > 3) get Insert Object , if it is processed delete that key from > processedInserts > > Just make sure one of the TASK-1 and TASK2 running at one time. You can even > run task-2 as part task-1 after step 4, upto you where you see it as safe > and less If then else :) > > > > > > > > On Wed, Jun 15, 2011 at 4:20 PM, thecheatah <[email protected]> wrote: > > Ravi, > > > Thanks for the feedback. I was thinking exactly along the lines of > > what you have said. The only problem that I see is that I plan on > > processing multiple inserts in one batch job. The inserts and the > > highly updated object will not be updatable in a single transaction. > > Thus, there might be situations where an insert was processed but the > > flag was not set or the row was not deleted. To overcome this issue, I > > am going to either use make sure that processing inserts multiple > > times does not effect the output or accept a small percentage of > > failures. > > > Ravneet > > > On Jun 15, 5:18 am, Ravi Sharma <[email protected]> wrote: > > > if A B C and are not dependent on each other and ordering doesnt matter > > for > > > you e.g. if you process C A B..then also its fine then you can put a > > another > > > column in this insert table. say processed. > > > When inserting make it N(if string) or false(boolean). > > > > and query that entity based on this column. > > > Whenevr you prcoess one row, make the value Y or true. and carry on with > > > next insert. > > > > or even you can delete these rows once you have processed them ..then you > > > will not need to have extra column.... > > > > Note: I am considering that for one update you will be processing all its > > > insert in one task or job...no mutliprocessing > > > > On Wed, Jun 15, 2011 at 4:20 AM, thecheatah <[email protected]> > > wrote: > > > > I am trying to implement a system for an object that will be updated a > > > > lot. The way I was thinking was to turn the updates into inserts then > > > > have a batch job that executes the inserts in batches to update the > > > > highly writable object. The inserts can either be sorted by time or by > > > > some sort of an incremented identifier. This identifier or timestamp > > > > can be stored on the highly writable object so the next time the job > > > > runs it knows where to start executing the next batch. > > > > > Using timestamp I am running into a problem with eventual consistency. > > > > When I search for inserts to execute some inserts might not make it > > > > into the query because they were not inserted into the index yet. So > > > > suppose we have insert A, B and C. If A and C make it into the batch > > > > job, it will mark all work up to C completed and B will never be > > > > executed. > > > > > Using incremented identifiers seems like it will solve the problem but > > > > implementing such an identifier itself is not clear. To explain why it > > > > would solve the original problem, we would be able to detect when we > > > > went from A to C as the difference in the identifiers would be greater > > > > then 1. The sharded counter is great for counting, but is not good to > > > > use as a unique identifier given eventual consistency. > > > > > I can use the memcached increment function but the counter might be > > > > flushed out of memory at anytime. I believe the memcache update speed > > > > should be enough for what I want to do. > > > > > If I had an upper bound time limit on the eventual consistency, I > > > > could make my system so that it only processes inserts older then the > > > > time limit. > > > > > Anyways those are my thoughts and any feedback is appreciated. > > > > > BTW: The inserts processed in batches are assumed to be not dependent > > > > on each other. > > > > > -- > > > > You received this message because you are subscribed to the Google > > Groups > > > > "Google App Engine" group. > > > > To post to this group, send email to [email protected] > > . > > > > To unsubscribe from this group, send email to > > > > [email protected]. > > > > For more options, visit this group at > > > >http://groups.google.com/group/google-appengine?hl=en. > > > -- > > You received this message because you are subscribed to the Google Groups > > "Google App Engine" group. > > To post to this group, send email to [email protected]. > > To unsubscribe from this group, send email to > > [email protected]. > > For more options, visit this group at > >http://groups.google.com/group/google-appengine?hl=en. -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
