Hey Eli, Thanks for the additional info. I'll have to watch for similar issues, as we are getting ready to test loading and deleting lots of data again.
Just a note, we use self-sizing batches to help speed up the deletes. We with batch sizes of 100 then keep stepping it up by 15% until we hit timeouts, once we hit timeouts we back off in 5% increments until the timeouts stop. This helped us out a bunch when wiping out our largest model. We find the batch sizes seem to stabilize around 375 for that model. Robert On Sun, Feb 21, 2010 at 1:31 PM, Eli Jones <[email protected]> wrote: > Do you mean on the Main page of the Dashboard for my app? No, there was no > indication of throttling there. > Mainly, I would get timeouts from the appengine_console.py when running this > command: > result = db.GqlQuery("Select __key__ from MyBigModel").fetch(1) > while fetching another Model would work fine: > result = db.GqlQuery("Select __key__ from MyRegularModel").fetch(1) > It looks like I could do .get_by_key_name() explicitly.. but I don't even > know which key names are still in the Model and I can't View the Datastore > to find out. I tried guessing.. but no luck so far. > Also, I left the automated task running overnight to continue doing > db.delete() in batches of 100 on the model.. but after 8 hours.. it had > managed only 1,600 batches. Which is 200 per hour and about 3 per minute. > So.. it was only able to manage to db.delete() 300 entities per minute over > the last 8 hours. > My guess is that these entities are just sprawled out all over the place.. > and doing a .fetch(100) on them just makes it cry. > I just gave the datastore an hour and a half break to collect itself.. but > now it just times out when the task tries to fetch(100). times out on > fetch(20).. > It's a little silly to see that clearing out 78 MB of Entities would bog > down the datastore this much.. but I guess I can understand since I wasn't > being very polite to the datastore and just aggressively told it to keep > deleting over and over. > I think I've deleted about 410,000 entities so far.. I was able to do about > 250,000 in batches of 500.. then I had to shift down to batches of 100 and > that managed to wipe another 160,000. If I remember correctly, this leaves > me with about 140,000 left in there somewhere. But, now I have to wait for > the Datastore to cooperate. > Next time, I may try doing db.delete() using pre-generated lists of > key_names.. the fetch() seems to be what is causing all the trouble. > > For those who like logs, here are the DeadlineExceededErrors I would see in > the log (this would be for when it was trying to do fetch(100) for a "Select > __key__ from MyBigModel"): > > File "/base/python_lib/versions/1/google/appengine/ext/db/__init__.py", > line 1616, in fetch > raw = raw_query.Get(limit, offset, rpc=rpc) > File "/base/python_lib/versions/1/google/appengine/api/datastore.py", line > 1183, in Get > limit=limit, offset=offset, prefetch_count=limit, **kwargs)._Get(limit) > File "/base/python_lib/versions/1/google/appengine/api/datastore.py", line > 1110, in _Run > datastore_pb.QueryResult(), rpc) > File "/base/python_lib/versions/1/google/appengine/api/datastore.py", line > 176, in _MakeSyncCall > rpc.wait() > File > "/base/python_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", > line 460, in wait > self.__rpc.Wait() > File "/base/python_lib/versions/1/google/appengine/api/apiproxy_rpc.py", > line 112, in Wait > rpc_completed = self._WaitImpl() > File "/base/python_lib/versions/1/google/appengine/runtime/apiproxy.py", > line 108, in _WaitImpl > rpc_completed = _apphosting_runtime___python__apiproxy.Wait(self) > > On Sun, Feb 21, 2010 at 10:12 AM, Robert Kluin <[email protected]> > wrote: >> >> Hi Eli, >> Did you happen to look at the App Engine Console after the datastore >> was "napping"? A few months ago when clearing a test datastore we hit >> a similar thing. When we looked at the App Engine Console it said the >> app was being temporarily throttled. >> >> Was just curious if that is what you encountered too? >> >> Robert >> >> >> >> >> >> >> On Sat, Feb 20, 2010 at 11:49 PM, Eli Jones <[email protected]> wrote: >> > As a side note.. once I hit about 250,000 entities deleted it seems the >> > datastore took a nap on me.. >> > So, now I'm waiting for it to finish whatever it is doing underneath >> > before >> > I can continue deleting. >> > Took a nap = db.delete() or .fetch(1) from the model times out. >> > Though, I can .fetch() from my other Models just fine. I figure it is >> > just >> > shuffling the data around and merging it to new tablets. >> > Granted.. the Model has been unavailable (I'm just judging this by >> > seeing if >> > I can do a .fetch(1) using the appengine_console from my local) for 40 >> > minutes.. which is much longer than the brief period of time mentioned >> > for >> > tablet unavailability here: >> > http://code.google.com/appengine/articles/handling_datastore_errors.html >> > Ah.. seems I still can't .fetch from appengine_console, but I can sort >> > of >> > limp along and delete 100 at a time (but it takes about 16 seconds to >> > delete >> > 100 entities now) using the task I have set up to do this. >> > >> > On Sat, Feb 20, 2010 at 9:54 PM, Eli Jones <[email protected]> wrote: >> >> >> >> I am currently going through the process of deleting 500,000 entities >> >> from >> >> my datastore. >> >> Here are the different stats I have so far >> >> db.delete() for: >> >> 100 entities = 2,179 API_CPU >> >> 200 entities = 4,345 API_CPU >> >> 500 entities = 10,845 API_CPU >> >> So.. it doesn't seem like you get better per entity API_CPU for >> >> deleting >> >> more at once. It seems to average about 21 API_CPU per entity deleted. >> >> There doesn't even really seem to be a general time benefit either. It >> >> seems to average about 1 to 2 seconds per 100 entities deleted. >> >> On Sat, Feb 20, 2010 at 3:21 AM, kang <[email protected]> wrote: >> >>> >> >>> I'm going to clear the datastore. I use the following code: >> >>> old_date = datetime.datetime(2009,10,1) >> >>> old_updates = SomeUpdate.all().filter("updated >> >>> <",old_date).fetch(20) >> >>> db.delete(old_updates) >> >>> it costs me nearly 1982cpu_ms 1945api_cpu_ms every time. Is it normal? >> >>> >> >>> -- >> >>> Stay hungry,Stay foolish. >> >>> >> >>> -- >> >>> You received this message because you are subscribed to the Google >> >>> Groups >> >>> "Google App Engine" group. >> >>> To post to this group, send email to >> >>> [email protected]. >> >>> To unsubscribe from this group, send email to >> >>> [email protected]. >> >>> For more options, visit this group at >> >>> http://groups.google.com/group/google-appengine?hl=en. >> >> >> > >> > -- >> > You received this message because you are subscribed to the Google >> > Groups >> > "Google App Engine" group. >> > To post to this group, send email to [email protected]. >> > To unsubscribe from this group, send email to >> > [email protected]. >> > For more options, visit this group at >> > http://groups.google.com/group/google-appengine?hl=en. >> > >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Google App Engine" group. >> To post to this group, send email to [email protected]. >> To unsubscribe from this group, send email to >> [email protected]. >> For more options, visit this group at >> http://groups.google.com/group/google-appengine?hl=en. >> > > -- > You received this message because you are subscribed to the Google Groups > "Google App Engine" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/google-appengine?hl=en. > -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
