I'm from the school where you avoid exceptions when possible.

Sorry, but this code you wrote turns my stomach.  Serialize multiple times 
*knowing* that an exception will be thrown?  The whole point of my question was 
that I wanted the optimal batch put, which *should* be computationally trivial.

Anyway, I think my original question has been answered, which is that google 
forgot to put a get_serialization_size() kind of method into db.Model, and they 
should either do that, or better yet, deal with the 1MB limit under the covers.

On Oct 30, 2010, at 6:13 PM, Eli Jones wrote:

> I think something similar to what Stephen mentioned is what you should try.
> 
> You're fixating on knowing the exact size of an entity (that will then allow 
> you to set an exact batch size)... but, as you've noted, there is no 
> convenient or efficient way to do that.
> 
> As long as you create your own entity key_names, you can do something naive 
> like this:
> 
> def batchPut(entityList, batchSize=100):
>     putList = []
>     count = len(entityList)
>     while count > 0:
>         batchSize = min(count,batchSize)
>         putList = entityList[:batchSize]
>         try:
>             db.Put(putList)
>             entityList = entityList[batchSize:]
>             count = len(entityList)
>         except TooManyError,TooLargeError:
>             batchSize = batchSize/2
> 
> You can modify something like that to seek out an optimal batch size.. and 
> return the value..
> 
> So, if you do batch putting that extends beyond the 30 second limit.. you can 
> chain a task that passes the optimal batch size off to the next task in the 
> chain.
> 
> It's not maximally optimal (since there is the chance the first few db.Puts() 
> won't work), but its pretty damn optimal.. especially if you keep track of 
> the optimal batch size over time.
> 
> What are the main reasons something like this won't work for you?       
> 
> On Sat, Oct 30, 2010 at 3:12 PM, Joshua Smith <[email protected]> 
> wrote:
> I understand the cause of this error.  Like I said, I have a bunch of large 
> entities to push into the datastore, and I want to do it as efficiently as 
> possible.
> 
> But it seems there is no efficient way to find out how big an entity is going 
> to be when crossing the API transom, so there is no way to do these puts 
> optimally.
> 
> For now, I've added a .size() method to my model, which generates an 
> over-estimate using some heuristics.  But that's a hack, and this really 
> should happen under the covers.
> 
> -Joshua
> 
> On Oct 30, 2010, at 1:54 PM, Jeff Schwartz wrote:
> 
>> The maximum size for an entity is 1 megabyte. The maximum number of entities 
>> in a batch put or delete is 500. These limits can be found at 
>> http://code.google.com/appengine/docs/python/datastore/overview.html which 
>> also provides information on other datastore limits.
>> 
>> So it appears that you are hitting the 1 megabyte limit, either for the 
>> total of all entities you are batch putting or for at least one of the them.
>> 
>> Try using logging while putting the entities individually to isolate and 
>> report the offending entity. Catch the exception and dump what ever the 
>> entity contains that will identify either where or how it was created in 
>> your workflow.
>> 
>> Jeff
>> 
>> On Sat, Oct 30, 2010 at 1:10 PM, Joshua Smith <[email protected]> 
>> wrote:
>> It was a lot of big entities.  The exception said it was the size, not the 
>> quantity.
>> 
>> On Oct 30, 2010, at 9:51 AM, Jeff Schwartz wrote:
>> 
>>> How many entities were there when the batch put failed?
>>> 
>>> Was it the size of the entities or the number of entities that caused the 
>>> batch put to fail?
>>> 
>>> Jeff
>>> 
>>> On Sat, Oct 30, 2010 at 8:39 AM, Stephen <[email protected]> wrote:
>>> 
>>> 
>>> On Oct 29, 6:24 pm, Joshua Smith <[email protected]> wrote:
>>> > I'm running into a too-large exception when I bulk put a bunch of 
>>> > entities.  So obviously, I need to break up my puts into batches.  I want 
>>> > to do something like this pseudo code:
>>> >
>>> > size = 0
>>> > for o in objects:
>>> >   if size + o.size() > 1MB:
>>> >     db.put(list)
>>> >     size = 0
>>> >     list = []
>>> >   list.append(o)
>>> >
>>> > Any idea what I could use for the "o.size()" method?  I could crawl 
>>> > through all the fields and build up an estimate, but it seems likely to 
>>> > me that there is a way to get the API-size of an entity more elegantly.
>>> 
>>> 
>>> How about something like:
>>> 
>>> 
>>> from google.appengine.api import datastore
>>> from google.appengine.runtime import apiproxy_errors
>>> 
>>> def put_all(entities, **kw):
>>>    try:
>>>        return datastore.Put(entities, **kw)
>>>    except apiproxy_errors.RequestTooLargeError:
>>>        n = len(entities) / 2
>>>        a, b = entities[:n], entities[n:]
>>>        return put_all(a, **kw).extend(put_all(b, **kw))
>>> 
>>> --
>>> You received this message because you are subscribed to the Google Groups 
>>> "Google App Engine" group.
>>> To post to this group, send email to [email protected].
>>> To unsubscribe from this group, send email to 
>>> [email protected].
>>> For more options, visit this group at 
>>> http://groups.google.com/group/google-appengine?hl=en.
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> Jeff
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "Google App Engine" group.
>>> To post to this group, send email to [email protected].
>>> To unsubscribe from this group, send email to 
>>> [email protected].
>>> For more options, visit this group at 
>>> http://groups.google.com/group/google-appengine?hl=en.
>> 
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Google App Engine" group.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to 
>> [email protected].
>> For more options, visit this group at 
>> http://groups.google.com/group/google-appengine?hl=en.
>> 
>> 
>> 
>> -- 
>> Jeff
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Google App Engine" group.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to 
>> [email protected].
>> For more options, visit this group at 
>> http://groups.google.com/group/google-appengine?hl=en.
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Google App Engine" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/google-appengine?hl=en.
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Google App Engine" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/google-appengine?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to