I think something similar to what Stephen mentioned is what you should try.
You're fixating on knowing the exact size of an entity (that will then allow
you to set an exact batch size)... but, as you've noted, there is no
convenient or efficient way to do that.
As long as you create your own entity key_names, you can do something naive
like this:
def batchPut(entityList, batchSize=100):
putList = []
count = len(entityList)
while count > 0:
batchSize = min(count,batchSize)
putList = entityList[:batchSize]
try:
db.Put(putList)
entityList = entityList[batchSize:]
count = len(entityList)
except TooManyError,TooLargeError:
batchSize = batchSize/2
You can modify something like that to seek out an optimal batch size.. and
return the value..
So, if you do batch putting that extends beyond the 30 second limit.. you
can chain a task that passes the optimal batch size off to the next task in
the chain.
It's not maximally optimal (since there is the chance the first few
db.Puts() won't work), but its pretty damn optimal.. especially if you keep
track of the optimal batch size over time.
What are the main reasons something like this won't work for you?
On Sat, Oct 30, 2010 at 3:12 PM, Joshua Smith <[email protected]>wrote:
> I understand the cause of this error. Like I said, I have a bunch of large
> entities to push into the datastore, and I want to do it as efficiently as
> possible.
>
> But it seems there is no efficient way to find out how big an entity is
> going to be when crossing the API transom, so there is no way to do these
> puts optimally.
>
> For now, I've added a .size() method to my model, which generates an
> over-estimate using some heuristics. But that's a hack, and this really
> should happen under the covers.
>
> -Joshua
>
> On Oct 30, 2010, at 1:54 PM, Jeff Schwartz wrote:
>
> The maximum size for an entity is 1 megabyte. The maximum number of
> entities in a batch put or delete is 500. These limits can be found at
> http://code.google.com/appengine/docs/python/datastore/overview.html which
> also provides information on other datastore limits.
>
> So it appears that you are hitting the 1 megabyte limit, either for the
> total of all entities you are batch putting or for at least one of the them.
>
> Try using logging while putting the entities individually to isolate and
> report the offending entity. Catch the exception and dump what ever the
> entity contains that will identify either where or how it was created in
> your workflow.
>
> Jeff
>
> On Sat, Oct 30, 2010 at 1:10 PM, Joshua Smith <[email protected]>wrote:
>
>> It was a lot of big entities. The exception said it was the size, not the
>> quantity.
>>
>> On Oct 30, 2010, at 9:51 AM, Jeff Schwartz wrote:
>>
>> How many entities were there when the batch put failed?
>>
>> Was it the size of the entities or the number of entities that caused the
>> batch put to fail?
>>
>> Jeff
>>
>> On Sat, Oct 30, 2010 at 8:39 AM, Stephen <[email protected]> wrote:
>>
>>>
>>>
>>> On Oct 29, 6:24 pm, Joshua Smith <[email protected]> wrote:
>>> > I'm running into a too-large exception when I bulk put a bunch of
>>> entities. So obviously, I need to break up my puts into batches. I want to
>>> do something like this pseudo code:
>>> >
>>> > size = 0
>>> > for o in objects:
>>> > if size + o.size() > 1MB:
>>> > db.put(list)
>>> > size = 0
>>> > list = []
>>> > list.append(o)
>>> >
>>> > Any idea what I could use for the "o.size()" method? I could crawl
>>> through all the fields and build up an estimate, but it seems likely to me
>>> that there is a way to get the API-size of an entity more elegantly.
>>>
>>>
>>> How about something like:
>>>
>>>
>>> from google.appengine.api import datastore
>>> from google.appengine.runtime import apiproxy_errors
>>>
>>> def put_all(entities, **kw):
>>> try:
>>> return datastore.Put(entities, **kw)
>>> except apiproxy_errors.RequestTooLargeError:
>>> n = len(entities) / 2
>>> a, b = entities[:n], entities[n:]
>>> return put_all(a, **kw).extend(put_all(b, **kw))
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "Google App Engine" group.
>>> To post to this group, send email to [email protected].
>>> To unsubscribe from this group, send email to
>>> [email protected]<google-appengine%[email protected]>
>>> .
>>> For more options, visit this group at
>>> http://groups.google.com/group/google-appengine?hl=en.
>>>
>>>
>>
>>
>> --
>> Jeff
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Google App Engine" group.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to
>> [email protected].
>> For more options, visit this group at
>> http://groups.google.com/group/google-appengine?hl=en.
>>
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Google App Engine" group.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to
>> [email protected]<google-appengine%[email protected]>
>> .
>> For more options, visit this group at
>> http://groups.google.com/group/google-appengine?hl=en.
>>
>
>
>
> --
> Jeff
>
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected]<google-appengine%[email protected]>
> .
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.
>
--
You received this message because you are subscribed to the Google Groups
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/google-appengine?hl=en.