Steve, No problem. Hopefully some of our / questions are beneficial. :) By the way, as an economist I totally agree with your last remark; I've had the same thoughts as you (re performance) many times. In this case I think Google incentive is competition -- predominately from AWS, and possibly from Azure. If the platform is too unstable or poor performing people will find a better solution.
If you are using the total from the counter as your id, you might want to think rethink using a sharded counter to generate your id values. If you get two requests sufficiently close together you will get a duplicate id. Each shard is in its own entity group, you can not read from other groups within a transaction. That means you are getting your id value outside a transaction. Of course if you are actually using the shard_id + the shard's current value as a key_name your fine, provided you get the value in a transaction. Robert On Thu, Nov 11, 2010 at 15:36, stevep <[email protected]> wrote: > > Robert, > > Overall let me say thanks. Your comments really helped for this > subject. > >> allocate_ids can be used to generate a single id, just like doing >> SomeKind().put() will generate an id automatically. I am not aware of >> any recommended way to use sharded counters to generate a unique >> sequence of sequential numbers. > > We are using this code (suggested for for incrementing counters, but > works for sequential key values as well): > http://code.google.com/appengine/articles/sharding_counters.html > > To be honest, I was not aware of allocate_ids when setting this up > (started coding as noobie_levelZero, now have advanced to > noobie_levelOne :-) > > There seems to be very little overhead for this approach, so will > stick with it for now. We do download this model's data for some > analytics processing. Sequential numeric keys are not necessary for > this, but they prove beneficial anytime we eyeball the analytics. Not > sure if the allocate_id key values would be numerically sequenced. > Once everything else is done, I'll look deeper into comparing these > two approaches, and will post a thread at that time. That's surely a > "don't hold your breath" schedule though. > >> Eli brought up a good point regarding this issue. I assume the reason >> for this 'complicated' process is to return an id to the client as >> quickly as possible, that way if the client re-submits the request you >> do not get duplicated data? > > The bigger issue (as per my response to Eli) is to avoid throttling of > the client response handler. > > Unless we screw up the handling of the generated key value from the > initial POST call, a resend will simply cause a duplicate put() > costing us cpu cycles, but not a duplicate record. > >> I was more curious about the case when you make a request then the >> internet connection fails (or the user hits 'submit' twice really >> fast). The server will still successfully completes the write, but >> the client will not know. So how do you prevent the client from >> re-submitting the request, which might result in a duplicate record. > > Right now we're using an Adobe AIR app for the client using keys > generated by the client. Process first posts the data to the local AIR > sqlite DB, then, if the user is on-line, it attempts the GAE POST. If > the response works, the returned key value goes into the sqlite DB as > a reference field. GUI is disabled while this happens (with an > appropriate dialog showing). This is all going to be changed with the > new browser only version, so these issues will need to be addressed. > >> I totally agree with this approach, I think it is very similar to my >> process. I do not use the task-queue to write the initial record >> though, it is put during the request I return the key in. Your >> approach should be safe because you return the key in the first >> request, but there could be cases that result in lots of unneeded >> re-submits. For instance if the task-queue is backed up. > > Thanks -- nice to know I am not missing something obvious. Writing the > new rec during the initial POST call is much preferred as I noted in > my response to Eli, but I just think we will have too much "double > whammy" risks when GAE infrastructure is under load -- see end of my > response to Eli.** > > My thanks again, > stevep > > ** Wouldn't it be nice if the throttling limit was dynamic according > to how well GAE infrastructure was running -- something we cannot > control, so why do we end up paying for it. There is IMHO a perverse, > reverse incentive in current setup where profit and revenues will > maximize at the point where GAE infrastructure investments are > minimized to yield maximum "under load" conditions without causing > customers to decide other other cloud services are clearly superior. > Note that this also applies to cold start cpu overhead costs. A > dynamic throttling algo and a standard cold-start charge (based on > standard infrastructure performance, not varying due to load) would go > a very long way to correct this perversion. Here's an interesting link > about importance of incentives that made me think of GAE's current > setup when I read it (weird but true ): > http://www.npr.org/blogs/money/2010/09/09/129757852/pop-quiz-how-do-you-stop-sea-captains-from-killing-their-passengers > > -- > You received this message because you are subscribed to the Google Groups > "Google App Engine" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/google-appengine?hl=en. > > -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
