On Wed, Jul 15, 2009 at 7:03 PM, richard emberson<[email protected]> wrote: > > Nick, > > Again, thank you for your response. > > Two Items > > First Item: > Is there some Java API to the short term quotas so the I can > write code that "knows" how fast to go and when to stop? > If not, are the upload short term rate limits published > somewhere? Are they based on number of requests (which I can > control), number of bytes stored per minute (which I can > control), GAE cpu time (which I can not measure) or some > other measure?
Not currently, no. Your best bet is to write your uploader to run until it gets over quota errors, then back off a bit. The rates apply to all the quotas that are accounted for on App Engine, and constitute a fraction of your available daily quota for each. Increasing your daily limits through billing will also increase the short-term buckets. In some cases, simply enabling billing raises the limits. > > > Second Item: > Thanks for talking about the Python bulk loader. It gave me > an idea. Just as I extract data from the protocol layer without > creating any Entities, during a bulk load I can push data into > the protocol layer without creating any Entities. > As you know, an Entity only live in client application code and > during a bulk load there is no need to create any of them. > Each Entity has a map with String names and values. If you > have tens of thousands of these it really consumes CPU and > memory. So, just as I got a 20% performance improvement during > extraction, I expect to get an performance improvements > during loading. This is exactly how remote_api works, which the Bulk Loader uses. -Nick Johnson > > Regards, > > Richard > > > Nick Johnson (Google) wrote: >> On Wed, Jul 15, 2009 at 6:14 PM, richard >> emberson<[email protected]> wrote: >>> Nick, >>> >>> Thank you for the response. >>> >>> I have tens of thousands of records to load. If I load them >>> all at once or "rate-limit" the load, wont I run out of the >>> short term quotas just the same? Or did you mean that I >>> ought to "rate-limit" my load over a number of days or weeks? >> >> The problem you're running into is loading so rapidly you're hitting >> the very short term quotas intended to prevent you consuming all your >> daily quota at once. If you rate-limit enough, you can avoid hitting >> the short-term quotas while still staying within the daily quotas. >> Whether or not you need to rate limit enough to cover more than one >> day, or buy extra quota, is another issue. >> >>> I am trying to determine if with large datasets, GAE is an >>> adequate platform onto which the application I have in mind >>> can be hosted. Currently, I am doing an evaluation. I've >>> not yet built the application because I want to know if >>> GAE has adequate performance. >>> >>> I've have already re-written the client-side code that >>> extracts the data from the protocol layer and achieved a >>> 20% performance increase over the shipped 1.2.2 sdk on the >>> production GAE server (my new code was only 12% to 15% >>> faster on the >>> local development server so 20% was unexpected). >>> So, performance is critical for me - performance against >>> large dataset. >>> >>> I don't know if the Python bulkloader will be an improvement. >>> I ship the data up as csv blocks with are parsed into Entities >>> and then stored. Pretty simple. >> >> The Python bulk loader does all the translation into entities on the >> client side, and then uses remote_api to send the encoded data over. >> This inevitably leads to less CPU utilization than parsing it yourself >> on the server. Nevertheless, the main reason I recommended the Python >> bulk loader is because it has support for concurrency and >> rate-limiting built right in. >> >>> Concerning the speed of deleting existing data. You suggested >>> using key-only queries. In my initial email that you responded >>> to, I had a short code snippet where, in deed, I set the >>> query to use keys only. So, was the code incorrect? >> >> Sorry, I didn't read the snippet in enough detail. >> >> -Nick Johnson >> >>> Richard Emberson >>> >>> >>> Nick Johnson (Google) wrote: >>>> Hi Richard, >>>> >>>> You're running into short term quotas, which are designed to prevent >>>> you exhausting your entire quota for the day in one go. You need to >>>> rate-limit your bulk loading code, and/or pay for additional quota. >>>> Even enabling billing without setting a high limit will increase your >>>> short term quotas automatically. >>>> >>>> You should also look at your bulk loading code and make sure it's as >>>> efficient as possible. One possibility is to use the Python >>>> bulkloader. >>>> >>>> As far as deletion goes, make sure you are doing key-only queries to >>>> get the key to delete, which will save on CPU time and timeouts. >>>> >>>> -Nick Johnson >>>> >>>> On Wed, Jul 15, 2009 at 12:11 AM, richard >>>> emberson<[email protected]> wrote: >>>>> So, once again, I've tried to upload some data. >>>>> >>>>> After a couple, I guess, thousand records I start >>>>> getting HttpServletResponse.SC_FORBIDDEN from >>>>> the app engine server. >>>>> >>>>> On the Dashboard it says: >>>>> >>>>> Your application is exceeding a quota: CPU Time >>>>> Your application is exceeding a quota: Datastore CPU Time >>>>> >>>>> but under Resource, CPU Time usage is at 34% >>>>> and Stored Data usage is at 4%. >>>>> >>>>> I am trying to develop an application on GAE. >>>>> I will need to load tens of thousands or >>>>> a couple of hundred thousand entities as part >>>>> of testing the application. I will then want >>>>> to delete those entities. >>>>> >>>>> Currently, I can only load a couple of hundred >>>>> before the app engine starts rejecting additional >>>>> uploads. And I can not delete any of them - I >>>>> keep getting timeouts - even if I try to delete only >>>>> 10. >>>>> >>>>> Is there some upload per minute quota or something? >>>>> And, whats the magic to delete stuff. >>>>> >>>>> The following code causes timeouts: >>>>> >>>>> DatastoreService ds = DatastoreServiceFactory.getDatastoreService(); >>>>> final Query q = new Query(kindName); >>>>> q.setKeysOnly(); >>>>> >>>>> final Iterable<Entity> entities = ds.prepare(q).asIterable( >>>>> FetchOptions.Builder.withLimit(count)); >>>>> KeyIterable ki = new KeyIterable(entities); >>>>> ds.delete(ki); >>>>> int numberDeleted = ki.getCount(); >>>>> return numberDeleted; >>>>> >>>>> >>>>> >>>>> Richard >>>>> >>>>> -- >>>>> Quis custodiet ipsos custodes >>>>> >>>> >>>> >>> -- >>> Quis custodiet ipsos custodes >>> >> >> >> > > -- > Quis custodiet ipsos custodes > > > > -- Nick Johnson, App Engine Developer Programs Engineer Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration Number: 368047 --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en -~----------~----~----~----~------~----~------~--~---
