I have been using a very similar technique for updates and deletes over large sets for over a year. It works extremely well; especially since it adapts to the current app engine performance weather. Of course I am looking for timeouts, not API call size issues.
Robert On Sat, Oct 30, 2010 at 18:13, Eli Jones <[email protected]> wrote: > I think something similar to what Stephen mentioned is what you should try. > You're fixating on knowing the exact size of an entity (that will then allow > you to set an exact batch size)... but, as you've noted, there is no > convenient or efficient way to do that. > As long as you create your own entity key_names, you can do something naive > like this: > def batchPut(entityList, batchSize=100): > putList = [] > count = len(entityList) > while count > 0: > batchSize = min(count,batchSize) > putList = entityList[:batchSize] > try: > db.Put(putList) > entityList = entityList[batchSize:] > count = len(entityList) > except TooManyError,TooLargeError: > batchSize = batchSize/2 > You can modify something like that to seek out an optimal batch size.. and > return the value.. > So, if you do batch putting that extends beyond the 30 second limit.. you > can chain a task that passes the optimal batch size off to the next task in > the chain. > It's not maximally optimal (since there is the chance the first few > db.Puts() won't work), but its pretty damn optimal.. especially if you keep > track of the optimal batch size over time. > What are the main reasons something like this won't work for you? > On Sat, Oct 30, 2010 at 3:12 PM, Joshua Smith <[email protected]> > wrote: >> >> I understand the cause of this error. Like I said, I have a bunch of >> large entities to push into the datastore, and I want to do it as >> efficiently as possible. >> But it seems there is no efficient way to find out how big an entity is >> going to be when crossing the API transom, so there is no way to do these >> puts optimally. >> For now, I've added a .size() method to my model, which generates an >> over-estimate using some heuristics. But that's a hack, and this really >> should happen under the covers. >> -Joshua >> On Oct 30, 2010, at 1:54 PM, Jeff Schwartz wrote: >> >> The maximum size for an entity is 1 megabyte. The maximum number of >> entities in a batch put or delete is 500. These limits can be found at >> http://code.google.com/appengine/docs/python/datastore/overview.html which >> also provides information on other datastore limits. >> >> So it appears that you are hitting the 1 megabyte limit, either for the >> total of all entities you are batch putting or for at least one of the them. >> >> Try using logging while putting the entities individually to isolate and >> report the offending entity. Catch the exception and dump what ever the >> entity contains that will identify either where or how it was created in >> your workflow. >> >> Jeff >> >> On Sat, Oct 30, 2010 at 1:10 PM, Joshua Smith <[email protected]> >> wrote: >>> >>> It was a lot of big entities. The exception said it was the size, not >>> the quantity. >>> On Oct 30, 2010, at 9:51 AM, Jeff Schwartz wrote: >>> >>> How many entities were there when the batch put failed? >>> >>> Was it the size of the entities or the number of entities that caused the >>> batch put to fail? >>> >>> Jeff >>> >>> On Sat, Oct 30, 2010 at 8:39 AM, Stephen <[email protected]> wrote: >>>> >>>> >>>> On Oct 29, 6:24 pm, Joshua Smith <[email protected]> wrote: >>>> > I'm running into a too-large exception when I bulk put a bunch of >>>> > entities. So obviously, I need to break up my puts into batches. I >>>> > want to >>>> > do something like this pseudo code: >>>> > >>>> > size = 0 >>>> > for o in objects: >>>> > if size + o.size() > 1MB: >>>> > db.put(list) >>>> > size = 0 >>>> > list = [] >>>> > list.append(o) >>>> > >>>> > Any idea what I could use for the "o.size()" method? I could crawl >>>> > through all the fields and build up an estimate, but it seems likely to >>>> > me >>>> > that there is a way to get the API-size of an entity more elegantly. >>>> >>>> >>>> How about something like: >>>> >>>> >>>> from google.appengine.api import datastore >>>> from google.appengine.runtime import apiproxy_errors >>>> >>>> def put_all(entities, **kw): >>>> try: >>>> return datastore.Put(entities, **kw) >>>> except apiproxy_errors.RequestTooLargeError: >>>> n = len(entities) / 2 >>>> a, b = entities[:n], entities[n:] >>>> return put_all(a, **kw).extend(put_all(b, **kw)) >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "Google App Engine" group. >>>> To post to this group, send email to [email protected]. >>>> To unsubscribe from this group, send email to >>>> [email protected]. >>>> For more options, visit this group at >>>> http://groups.google.com/group/google-appengine?hl=en. >>>> >>> >>> >>> >>> -- >>> Jeff >>> >>> -- >>> You received this message because you are subscribed to the Google Groups >>> "Google App Engine" group. >>> To post to this group, send email to [email protected]. >>> To unsubscribe from this group, send email to >>> [email protected]. >>> For more options, visit this group at >>> http://groups.google.com/group/google-appengine?hl=en. >>> >>> >>> -- >>> You received this message because you are subscribed to the Google Groups >>> "Google App Engine" group. >>> To post to this group, send email to [email protected]. >>> To unsubscribe from this group, send email to >>> [email protected]. >>> For more options, visit this group at >>> http://groups.google.com/group/google-appengine?hl=en. >> >> >> >> -- >> Jeff >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Google App Engine" group. >> To post to this group, send email to [email protected]. >> To unsubscribe from this group, send email to >> [email protected]. >> For more options, visit this group at >> http://groups.google.com/group/google-appengine?hl=en. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Google App Engine" group. >> To post to this group, send email to [email protected]. >> To unsubscribe from this group, send email to >> [email protected]. >> For more options, visit this group at >> http://groups.google.com/group/google-appengine?hl=en. > > -- > You received this message because you are subscribed to the Google Groups > "Google App Engine" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/google-appengine?hl=en. > -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
