I'm from the school where you avoid exceptions when possible. Sorry, but this code you wrote turns my stomach. Serialize multiple times *knowing* that an exception will be thrown? The whole point of my question was that I wanted the optimal batch put, which *should* be computationally trivial.
Anyway, I think my original question has been answered, which is that google forgot to put a get_serialization_size() kind of method into db.Model, and they should either do that, or better yet, deal with the 1MB limit under the covers. On Oct 30, 2010, at 6:13 PM, Eli Jones wrote: > I think something similar to what Stephen mentioned is what you should try. > > You're fixating on knowing the exact size of an entity (that will then allow > you to set an exact batch size)... but, as you've noted, there is no > convenient or efficient way to do that. > > As long as you create your own entity key_names, you can do something naive > like this: > > def batchPut(entityList, batchSize=100): > putList = [] > count = len(entityList) > while count > 0: > batchSize = min(count,batchSize) > putList = entityList[:batchSize] > try: > db.Put(putList) > entityList = entityList[batchSize:] > count = len(entityList) > except TooManyError,TooLargeError: > batchSize = batchSize/2 > > You can modify something like that to seek out an optimal batch size.. and > return the value.. > > So, if you do batch putting that extends beyond the 30 second limit.. you can > chain a task that passes the optimal batch size off to the next task in the > chain. > > It's not maximally optimal (since there is the chance the first few db.Puts() > won't work), but its pretty damn optimal.. especially if you keep track of > the optimal batch size over time. > > What are the main reasons something like this won't work for you? > > On Sat, Oct 30, 2010 at 3:12 PM, Joshua Smith <[email protected]> > wrote: > I understand the cause of this error. Like I said, I have a bunch of large > entities to push into the datastore, and I want to do it as efficiently as > possible. > > But it seems there is no efficient way to find out how big an entity is going > to be when crossing the API transom, so there is no way to do these puts > optimally. > > For now, I've added a .size() method to my model, which generates an > over-estimate using some heuristics. But that's a hack, and this really > should happen under the covers. > > -Joshua > > On Oct 30, 2010, at 1:54 PM, Jeff Schwartz wrote: > >> The maximum size for an entity is 1 megabyte. The maximum number of entities >> in a batch put or delete is 500. These limits can be found at >> http://code.google.com/appengine/docs/python/datastore/overview.html which >> also provides information on other datastore limits. >> >> So it appears that you are hitting the 1 megabyte limit, either for the >> total of all entities you are batch putting or for at least one of the them. >> >> Try using logging while putting the entities individually to isolate and >> report the offending entity. Catch the exception and dump what ever the >> entity contains that will identify either where or how it was created in >> your workflow. >> >> Jeff >> >> On Sat, Oct 30, 2010 at 1:10 PM, Joshua Smith <[email protected]> >> wrote: >> It was a lot of big entities. The exception said it was the size, not the >> quantity. >> >> On Oct 30, 2010, at 9:51 AM, Jeff Schwartz wrote: >> >>> How many entities were there when the batch put failed? >>> >>> Was it the size of the entities or the number of entities that caused the >>> batch put to fail? >>> >>> Jeff >>> >>> On Sat, Oct 30, 2010 at 8:39 AM, Stephen <[email protected]> wrote: >>> >>> >>> On Oct 29, 6:24 pm, Joshua Smith <[email protected]> wrote: >>> > I'm running into a too-large exception when I bulk put a bunch of >>> > entities. So obviously, I need to break up my puts into batches. I want >>> > to do something like this pseudo code: >>> > >>> > size = 0 >>> > for o in objects: >>> > if size + o.size() > 1MB: >>> > db.put(list) >>> > size = 0 >>> > list = [] >>> > list.append(o) >>> > >>> > Any idea what I could use for the "o.size()" method? I could crawl >>> > through all the fields and build up an estimate, but it seems likely to >>> > me that there is a way to get the API-size of an entity more elegantly. >>> >>> >>> How about something like: >>> >>> >>> from google.appengine.api import datastore >>> from google.appengine.runtime import apiproxy_errors >>> >>> def put_all(entities, **kw): >>> try: >>> return datastore.Put(entities, **kw) >>> except apiproxy_errors.RequestTooLargeError: >>> n = len(entities) / 2 >>> a, b = entities[:n], entities[n:] >>> return put_all(a, **kw).extend(put_all(b, **kw)) >>> >>> -- >>> You received this message because you are subscribed to the Google Groups >>> "Google App Engine" group. >>> To post to this group, send email to [email protected]. >>> To unsubscribe from this group, send email to >>> [email protected]. >>> For more options, visit this group at >>> http://groups.google.com/group/google-appengine?hl=en. >>> >>> >>> >>> >>> -- >>> Jeff >>> >>> -- >>> You received this message because you are subscribed to the Google Groups >>> "Google App Engine" group. >>> To post to this group, send email to [email protected]. >>> To unsubscribe from this group, send email to >>> [email protected]. >>> For more options, visit this group at >>> http://groups.google.com/group/google-appengine?hl=en. >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Google App Engine" group. >> To post to this group, send email to [email protected]. >> To unsubscribe from this group, send email to >> [email protected]. >> For more options, visit this group at >> http://groups.google.com/group/google-appengine?hl=en. >> >> >> >> -- >> Jeff >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Google App Engine" group. >> To post to this group, send email to [email protected]. >> To unsubscribe from this group, send email to >> [email protected]. >> For more options, visit this group at >> http://groups.google.com/group/google-appengine?hl=en. > > > -- > You received this message because you are subscribed to the Google Groups > "Google App Engine" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/google-appengine?hl=en. > > > -- > You received this message because you are subscribed to the Google Groups > "Google App Engine" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/google-appengine?hl=en. -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
