Josh,

You've gotta twist your mind into thinking about optimistic locking and how
that might sort of apply to some programming concepts.

The idea is that your system learns an optimal batch size and eventually
suffers no exceptions.  So you know an exception can be thrown, but you have
faith in a loving god above us all that it won't happen very often.  And,
you "know" (presuming entity size does not vary by 2000% or something) that,
if done correctly, it will save you a lot more time and resources over the
long run than obsessively calculating and re-calculating the entity sizes.

I know this isn't the same thing as optimistic concurrency control or
optimistic locking, but it is similar.  You transition from the mindset of
trying to control as much as possible and to prevent every possible
exception.. to just sort of "hey man...  99% chance this will probably
work.. peace.. love" (and resource savings)

Of course, if you don't have a consistent entity size or if some other
reason causes exceptions to be thrown for every batch put.. then it won't be
of much use (unless throwing the exception and retrying costs less than
estimating every entity size before putting a batch).

Once I became familiar with the concept of Optimistic Locking in database
design, it sort of warped my brain in all sorts of other ways.. and I
haven't recovered since.  (Then eventually you begin to accept multiple of
ways for searching out nearly-optimal solutions instead of
absolutely-optimal ones.. and its all sort of down hill from there.)

On Sun, Oct 31, 2010 at 8:10 AM, Joshua Smith <[email protected]>wrote:

> I'm from the school where you avoid exceptions when possible.
>
> Sorry, but this code you wrote turns my stomach.  Serialize multiple times
> *knowing* that an exception will be thrown?  The whole point of my question
> was that I wanted the optimal batch put, which *should* be computationally
> trivial.
>
> Anyway, I think my original question has been answered, which is that
> google forgot to put a get_serialization_size() kind of method into
> db.Model, and they should either do that, or better yet, deal with the 1MB
> limit under the covers.
>
> On Oct 30, 2010, at 6:13 PM, Eli Jones wrote:
>
> I think something similar to what Stephen mentioned is what you should try.
>
> You're fixating on knowing the exact size of an entity (that will then
> allow you to set an exact batch size)... but, as you've noted, there is no
> convenient or efficient way to do that.
>
> As long as you create your own entity key_names, you can do something naive
> like this:
>
> def batchPut(entityList, batchSize=100):
>     putList = []
>     count = len(entityList)
>     while count > 0:
>         batchSize = min(count,batchSize)
>         putList = entityList[:batchSize]
>         try:
>             db.Put(putList)
>             entityList = entityList[batchSize:]
>             count = len(entityList)
>         except TooManyError,TooLargeError:
>             batchSize = batchSize/2
>
> You can modify something like that to seek out an optimal batch size.. and
> return the value..
>
> So, if you do batch putting that extends beyond the 30 second limit.. you
> can chain a task that passes the optimal batch size off to the next task in
> the chain.
>
> It's not maximally optimal (since there is the chance the first few
> db.Puts() won't work), but its pretty damn optimal.. especially if you keep
> track of the optimal batch size over time.
>
> What are the main reasons something like this won't work for you?
>
> On Sat, Oct 30, 2010 at 3:12 PM, Joshua Smith <[email protected]>wrote:
>
>> I understand the cause of this error.  Like I said, I have a bunch of
>> large entities to push into the datastore, and I want to do it as
>> efficiently as possible.
>>
>> But it seems there is no efficient way to find out how big an entity is
>> going to be when crossing the API transom, so there is no way to do these
>> puts optimally.
>>
>> For now, I've added a .size() method to my model, which generates an
>> over-estimate using some heuristics.  But that's a hack, and this really
>> should happen under the covers.
>>
>> -Joshua
>>
>> On Oct 30, 2010, at 1:54 PM, Jeff Schwartz wrote:
>>
>> The maximum size for an entity is 1 megabyte. The maximum number of
>> entities in a batch put or delete is 500. These limits can be found at
>> http://code.google.com/appengine/docs/python/datastore/overview.htmlwhich 
>> also provides information on other datastore limits.
>>
>> So it appears that you are hitting the 1 megabyte limit, either for the
>> total of all entities you are batch putting or for at least one of the them.
>>
>> Try using logging while putting the entities individually to isolate and
>> report the offending entity. Catch the exception and dump what ever the
>> entity contains that will identify either where or how it was created in
>> your workflow.
>>
>> Jeff
>>
>> On Sat, Oct 30, 2010 at 1:10 PM, Joshua Smith 
>> <[email protected]>wrote:
>>
>>> It was a lot of big entities.  The exception said it was the size, not
>>> the quantity.
>>>
>>> On Oct 30, 2010, at 9:51 AM, Jeff Schwartz wrote:
>>>
>>> How many entities were there when the batch put failed?
>>>
>>> Was it the size of the entities or the number of entities that caused the
>>> batch put to fail?
>>>
>>> Jeff
>>>
>>> On Sat, Oct 30, 2010 at 8:39 AM, Stephen <[email protected]> wrote:
>>>
>>>>
>>>>
>>>> On Oct 29, 6:24 pm, Joshua Smith <[email protected]> wrote:
>>>> > I'm running into a too-large exception when I bulk put a bunch of
>>>> entities.  So obviously, I need to break up my puts into batches.  I want 
>>>> to
>>>> do something like this pseudo code:
>>>> >
>>>> > size = 0
>>>> > for o in objects:
>>>> >   if size + o.size() > 1MB:
>>>> >     db.put(list)
>>>> >     size = 0
>>>> >     list = []
>>>> >   list.append(o)
>>>> >
>>>> > Any idea what I could use for the "o.size()" method?  I could crawl
>>>> through all the fields and build up an estimate, but it seems likely to me
>>>> that there is a way to get the API-size of an entity more elegantly.
>>>>
>>>>
>>>> How about something like:
>>>>
>>>>
>>>> from google.appengine.api import datastore
>>>> from google.appengine.runtime import apiproxy_errors
>>>>
>>>> def put_all(entities, **kw):
>>>>    try:
>>>>        return datastore.Put(entities, **kw)
>>>>    except apiproxy_errors.RequestTooLargeError:
>>>>        n = len(entities) / 2
>>>>        a, b = entities[:n], entities[n:]
>>>>        return put_all(a, **kw).extend(put_all(b, **kw))
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Google App Engine" group.
>>>> To post to this group, send email to [email protected].
>>>> To unsubscribe from this group, send email to
>>>> [email protected]<google-appengine%[email protected]>
>>>> .
>>>> For more options, visit this group at
>>>> http://groups.google.com/group/google-appengine?hl=en.
>>>>
>>>>
>>>
>>>
>>> --
>>> Jeff
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "Google App Engine" group.
>>> To post to this group, send email to [email protected].
>>> To unsubscribe from this group, send email to
>>> [email protected].
>>> For more options, visit this group at
>>> http://groups.google.com/group/google-appengine?hl=en.
>>>
>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "Google App Engine" group.
>>> To post to this group, send email to [email protected].
>>> To unsubscribe from this group, send email to
>>> [email protected]<google-appengine%[email protected]>
>>> .
>>> For more options, visit this group at
>>> http://groups.google.com/group/google-appengine?hl=en.
>>>
>>
>>
>>
>> --
>> Jeff
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Google App Engine" group.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to
>> [email protected].
>> For more options, visit this group at
>> http://groups.google.com/group/google-appengine?hl=en.
>>
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Google App Engine" group.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to
>> [email protected]<google-appengine%[email protected]>
>> .
>> For more options, visit this group at
>> http://groups.google.com/group/google-appengine?hl=en.
>>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.
>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected]<google-appengine%[email protected]>
> .
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to