On Oct 30, 1:39 pm, Stephen <[email protected]> wrote:
> On Oct 29, 6:24 pm, Joshua Smith <[email protected]> wrote:
>
> > I'm running into a too-large exception when I bulk put a bunch of entities. 
> >  So obviously, I need to break up my puts into batches.  I want to do 
> > something like this pseudo code:
>
> > size = 0
> > for o in objects:
> >   if size + o.size() > 1MB:
> >     db.put(list)
> >     size = 0
> >     list = []
> >   list.append(o)
>
> > Any idea what I could use for the "o.size()" method?  I could crawl through 
> > all the fields and build up an estimate, but it seems likely to me that 
> > there is a way to get the API-size of an entity more elegantly.
>
> How about something like:
>
> from google.appengine.api import datastore
> from google.appengine.runtime import apiproxy_errors
>
> def put_all(entities, **kw):
>     try:
>         return datastore.Put(entities, **kw)
>     except apiproxy_errors.RequestTooLargeError:
>         n = len(entities) / 2
>         a, b = entities[:n], entities[n:]
>         return put_all(a, **kw).extend(put_all(b, **kw))


Although the general idea of the above code is to rely on the
apiproxy_stub to accurately measure rpc size and split if too big, if
you regularly try to put() large batch sizes you suffer the same
overhead already mentioned: converting from model to entity to
protobuf multiple times.

So how about something like this (untested...):


from google.appengine.api import datastore
from google.appengine.runtime import apiproxy_errors

def put_all(models, **kw):
    rpc = datastore.GetRpcFromKwargs(kw)
    models, multiple = datastore.NormalizeAndTypeCheck(models, Model)
    assert multiple
    entities =
[model._populate_internal_entity(_entity_class=_CachedEntity)
                  for model in models]
    return _put_or_split(entities, rpc, **kw)

def _put_or_split(entities, rpc, **kw):
    try:
        return datastore.Put(entities, rpc=rpc, **kw)
    except apiproxy_errors.RequestTooLargeError:
        n = len(entities) / 2
        a, b = entities[:n], entities[n:]
        logging.warn('batch put of %d entities failed,'
                     ' trying batches of %d and %d',
                     len(entities), len(a), len(b))
        return _put_or_split(a, rpc, **kw).extend(_put_or_split(b,
rpc, **kw))

class _CachedEntity(datastore.Entity):
    def _ToPb(self, **kw):
        if getattr(self, '__cached_pb', None) is None:
            self.__cached_pb = super(_CachedEntity, self)._ToPb(**kw)
        return self.__cached_pb

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to