On Oct 30, 1:39 pm, Stephen <[email protected]> wrote:
> On Oct 29, 6:24 pm, Joshua Smith <[email protected]> wrote:
>
> > I'm running into a too-large exception when I bulk put a bunch of entities.
> > So obviously, I need to break up my puts into batches. I want to do
> > something like this pseudo code:
>
> > size = 0
> > for o in objects:
> > if size + o.size() > 1MB:
> > db.put(list)
> > size = 0
> > list = []
> > list.append(o)
>
> > Any idea what I could use for the "o.size()" method? I could crawl through
> > all the fields and build up an estimate, but it seems likely to me that
> > there is a way to get the API-size of an entity more elegantly.
>
> How about something like:
>
> from google.appengine.api import datastore
> from google.appengine.runtime import apiproxy_errors
>
> def put_all(entities, **kw):
> try:
> return datastore.Put(entities, **kw)
> except apiproxy_errors.RequestTooLargeError:
> n = len(entities) / 2
> a, b = entities[:n], entities[n:]
> return put_all(a, **kw).extend(put_all(b, **kw))
Although the general idea of the above code is to rely on the
apiproxy_stub to accurately measure rpc size and split if too big, if
you regularly try to put() large batch sizes you suffer the same
overhead already mentioned: converting from model to entity to
protobuf multiple times.
So how about something like this (untested...):
from google.appengine.api import datastore
from google.appengine.runtime import apiproxy_errors
def put_all(models, **kw):
rpc = datastore.GetRpcFromKwargs(kw)
models, multiple = datastore.NormalizeAndTypeCheck(models, Model)
assert multiple
entities =
[model._populate_internal_entity(_entity_class=_CachedEntity)
for model in models]
return _put_or_split(entities, rpc, **kw)
def _put_or_split(entities, rpc, **kw):
try:
return datastore.Put(entities, rpc=rpc, **kw)
except apiproxy_errors.RequestTooLargeError:
n = len(entities) / 2
a, b = entities[:n], entities[n:]
logging.warn('batch put of %d entities failed,'
' trying batches of %d and %d',
len(entities), len(a), len(b))
return _put_or_split(a, rpc, **kw).extend(_put_or_split(b,
rpc, **kw))
class _CachedEntity(datastore.Entity):
def _ToPb(self, **kw):
if getattr(self, '__cached_pb', None) is None:
self.__cached_pb = super(_CachedEntity, self)._ToPb(**kw)
return self.__cached_pb
--
You received this message because you are subscribed to the Google Groups
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/google-appengine?hl=en.