I have been using a very similar technique for updates and deletes
over large sets for over a year.  It works extremely well; especially
since it adapts to the current app engine performance weather.  Of
course I am looking for timeouts, not API call size issues.



Robert





On Sat, Oct 30, 2010 at 18:13, Eli Jones <[email protected]> wrote:
> I think something similar to what Stephen mentioned is what you should try.
> You're fixating on knowing the exact size of an entity (that will then allow
> you to set an exact batch size)... but, as you've noted, there is no
> convenient or efficient way to do that.
> As long as you create your own entity key_names, you can do something naive
> like this:
> def batchPut(entityList, batchSize=100):
>     putList = []
>     count = len(entityList)
>     while count > 0:
>         batchSize = min(count,batchSize)
>         putList = entityList[:batchSize]
>         try:
>             db.Put(putList)
>             entityList = entityList[batchSize:]
>             count = len(entityList)
>         except TooManyError,TooLargeError:
>             batchSize = batchSize/2
> You can modify something like that to seek out an optimal batch size.. and
> return the value..
> So, if you do batch putting that extends beyond the 30 second limit.. you
> can chain a task that passes the optimal batch size off to the next task in
> the chain.
> It's not maximally optimal (since there is the chance the first few
> db.Puts() won't work), but its pretty damn optimal.. especially if you keep
> track of the optimal batch size over time.
> What are the main reasons something like this won't work for you?
> On Sat, Oct 30, 2010 at 3:12 PM, Joshua Smith <[email protected]>
> wrote:
>>
>> I understand the cause of this error.  Like I said, I have a bunch of
>> large entities to push into the datastore, and I want to do it as
>> efficiently as possible.
>> But it seems there is no efficient way to find out how big an entity is
>> going to be when crossing the API transom, so there is no way to do these
>> puts optimally.
>> For now, I've added a .size() method to my model, which generates an
>> over-estimate using some heuristics.  But that's a hack, and this really
>> should happen under the covers.
>> -Joshua
>> On Oct 30, 2010, at 1:54 PM, Jeff Schwartz wrote:
>>
>> The maximum size for an entity is 1 megabyte. The maximum number of
>> entities in a batch put or delete is 500. These limits can be found at
>> http://code.google.com/appengine/docs/python/datastore/overview.html which
>> also provides information on other datastore limits.
>>
>> So it appears that you are hitting the 1 megabyte limit, either for the
>> total of all entities you are batch putting or for at least one of the them.
>>
>> Try using logging while putting the entities individually to isolate and
>> report the offending entity. Catch the exception and dump what ever the
>> entity contains that will identify either where or how it was created in
>> your workflow.
>>
>> Jeff
>>
>> On Sat, Oct 30, 2010 at 1:10 PM, Joshua Smith <[email protected]>
>> wrote:
>>>
>>> It was a lot of big entities.  The exception said it was the size, not
>>> the quantity.
>>> On Oct 30, 2010, at 9:51 AM, Jeff Schwartz wrote:
>>>
>>> How many entities were there when the batch put failed?
>>>
>>> Was it the size of the entities or the number of entities that caused the
>>> batch put to fail?
>>>
>>> Jeff
>>>
>>> On Sat, Oct 30, 2010 at 8:39 AM, Stephen <[email protected]> wrote:
>>>>
>>>>
>>>> On Oct 29, 6:24 pm, Joshua Smith <[email protected]> wrote:
>>>> > I'm running into a too-large exception when I bulk put a bunch of
>>>> > entities.  So obviously, I need to break up my puts into batches.  I 
>>>> > want to
>>>> > do something like this pseudo code:
>>>> >
>>>> > size = 0
>>>> > for o in objects:
>>>> >   if size + o.size() > 1MB:
>>>> >     db.put(list)
>>>> >     size = 0
>>>> >     list = []
>>>> >   list.append(o)
>>>> >
>>>> > Any idea what I could use for the "o.size()" method?  I could crawl
>>>> > through all the fields and build up an estimate, but it seems likely to 
>>>> > me
>>>> > that there is a way to get the API-size of an entity more elegantly.
>>>>
>>>>
>>>> How about something like:
>>>>
>>>>
>>>> from google.appengine.api import datastore
>>>> from google.appengine.runtime import apiproxy_errors
>>>>
>>>> def put_all(entities, **kw):
>>>>    try:
>>>>        return datastore.Put(entities, **kw)
>>>>    except apiproxy_errors.RequestTooLargeError:
>>>>        n = len(entities) / 2
>>>>        a, b = entities[:n], entities[n:]
>>>>        return put_all(a, **kw).extend(put_all(b, **kw))
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Google App Engine" group.
>>>> To post to this group, send email to [email protected].
>>>> To unsubscribe from this group, send email to
>>>> [email protected].
>>>> For more options, visit this group at
>>>> http://groups.google.com/group/google-appengine?hl=en.
>>>>
>>>
>>>
>>>
>>> --
>>> Jeff
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "Google App Engine" group.
>>> To post to this group, send email to [email protected].
>>> To unsubscribe from this group, send email to
>>> [email protected].
>>> For more options, visit this group at
>>> http://groups.google.com/group/google-appengine?hl=en.
>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "Google App Engine" group.
>>> To post to this group, send email to [email protected].
>>> To unsubscribe from this group, send email to
>>> [email protected].
>>> For more options, visit this group at
>>> http://groups.google.com/group/google-appengine?hl=en.
>>
>>
>>
>> --
>> Jeff
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Google App Engine" group.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to
>> [email protected].
>> For more options, visit this group at
>> http://groups.google.com/group/google-appengine?hl=en.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Google App Engine" group.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to
>> [email protected].
>> For more options, visit this group at
>> http://groups.google.com/group/google-appengine?hl=en.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to