[google-appengine] Re: Bulk data deletion woe

Beruk Mon, 06 Dec 2010 05:31:32 -0800

I am having similar problems using the bulkupdate library, which was
sort of a precursor to MapReduce, because bulkupdate iterates over a
query instead of fetching, and I've found that to be buggy and
unreliable:


http://code.google.com/p/googleappengine/issues/detail?id=4046

Could it be that MapReduce uses the iterator interface to the query
(for i in q) instead of fetching batches of entities, which would
explain why your custom delete job, which uses fetch, takes less time
to complete than the MR job?

Pascal


On Nov 15, 1:18 am, Eli Jones <[email protected]> wrote:
> From what I could tell, the map reduce delete job took up several times more
> CPU time (and wall clock time) than my custom delete job usually took.
>
> My usual utility class uses this method for deletes:
>
> 1. Create a query for all entities in a model with keys_only = True.
> 2. Fetch 100 keys.
> 3. Issues a deferred task to delete those 100 key names.
> 4. Use a  cursor to fetch 100 more, and issue deferred deletes until the
> query returns no more entities.
>
> This is usually pretty fast.. since the only bottle neck is the time it
> takes to fetch 100 key names and add the deferred task.  The surprising fact
> was that the default map reduce delete from the Datastore Admin page took so
> much for CPU.

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

[google-appengine] Re: Bulk data deletion woe

Reply via email to