Re: [google-appengine] Can anyone explain 5.8 million DB operations?

Jeff Schnitzer Thu, 05 Jul 2012 09:43:26 -0700

On Thu, Jul 5, 2012 at 7:14 AM, Barry Hunter <[email protected]> wrote:
>>
>> It was the update that was taking all this time. Updates from remote shell
>> are very slow. So it took 3 hrs for 1000 or so entities that it updated and
>> consumed 5.8m OPs in the process!
>
> Ah, well there you have your answer. The remote shell is not efficent
> for certain things.
>
> Looking back at the code you posted in the thread, looks like reading
> https://developers.google.com/appengine/articles/remote_api


Wow, that limitations section is interesting.  And kinda bizarre.
Google, why does the remote_api need to use offsets when it could be
using cursors?  My guess is that the remote_api code simply hasn't
been updated since cursors were invented.  This is worth filing an
issue over.

Another issue is that the OP is reporting 5.8M *read* operations;
offsets should cause *small* operations.  Either the OP is confusing
us or this is another major issue.

Sarang:  The real lesson here is that you should not try to map/reduce
your entities across the remote api.  If you're more careful about how
you iterate you can do it, but it's really too slow.  There is a
"right way" to do this on GAE:

 * Iterate over the entire keyset
 * For each key, enqueue a task
 * Each task loads-modifies-saves the entity in a transaction

This is what the map/reduce framework does.  It's clever about
iterating over the keyset because it can do that in parallel
("mapping" by dividing up the keyspace into multiple queries), and of
course the tasks all can run in parallel.  Depending on how many
resources you're willing to allocate to the problem, you can update
tens of thousands of entities per second.

You can either use the map/reduce framework, or whip up your own.  The
"reduce" part is trivial - a task that modifies your entity as you
like.  If you have small numbers of entities or you are willing to
wait, a simple iteration over your keyset is easy enough.  Queries
time out after 60s or so, so if you have more than ~100k entities you
will need to stop, grab a cursor, and re-encode a new task that
continues the query.  Use named tasks so that mapper tasks that repeat
don't go haywire and spin up new mapper tasks.

Of course, since you're doing this server-side, you will need to
upload code.  That shouldn't be a big deal.  If you want you can even
do this on a non-default version; tasks which are started on a
non-default version (ie nondefault.yourappid.appspot.com) get executed
on that same version.  Just watch out because task queue definitions
are shared among all versions; either use the same queue.yaml or don't
upload the task queue definitions when you upload the nondefault
version.

Jeff

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Re: [google-appengine] Can anyone explain 5.8 million DB operations?

Reply via email to