Nick, Again, thank you for your response.
Two Items First Item: Is there some Java API to the short term quotas so the I can write code that "knows" how fast to go and when to stop? If not, are the upload short term rate limits published somewhere? Are they based on number of requests (which I can control), number of bytes stored per minute (which I can control), GAE cpu time (which I can not measure) or some other measure? Second Item: Thanks for talking about the Python bulk loader. It gave me an idea. Just as I extract data from the protocol layer without creating any Entities, during a bulk load I can push data into the protocol layer without creating any Entities. As you know, an Entity only live in client application code and during a bulk load there is no need to create any of them. Each Entity has a map with String names and values. If you have tens of thousands of these it really consumes CPU and memory. So, just as I got a 20% performance improvement during extraction, I expect to get an performance improvements during loading. Regards, Richard Nick Johnson (Google) wrote: > On Wed, Jul 15, 2009 at 6:14 PM, richard > emberson<[email protected]> wrote: >> Nick, >> >> Thank you for the response. >> >> I have tens of thousands of records to load. If I load them >> all at once or "rate-limit" the load, wont I run out of the >> short term quotas just the same? Or did you mean that I >> ought to "rate-limit" my load over a number of days or weeks? > > The problem you're running into is loading so rapidly you're hitting > the very short term quotas intended to prevent you consuming all your > daily quota at once. If you rate-limit enough, you can avoid hitting > the short-term quotas while still staying within the daily quotas. > Whether or not you need to rate limit enough to cover more than one > day, or buy extra quota, is another issue. > >> I am trying to determine if with large datasets, GAE is an >> adequate platform onto which the application I have in mind >> can be hosted. Currently, I am doing an evaluation. I've >> not yet built the application because I want to know if >> GAE has adequate performance. >> >> I've have already re-written the client-side code that >> extracts the data from the protocol layer and achieved a >> 20% performance increase over the shipped 1.2.2 sdk on the >> production GAE server (my new code was only 12% to 15% >> faster on the >> local development server so 20% was unexpected). >> So, performance is critical for me - performance against >> large dataset. >> >> I don't know if the Python bulkloader will be an improvement. >> I ship the data up as csv blocks with are parsed into Entities >> and then stored. Pretty simple. > > The Python bulk loader does all the translation into entities on the > client side, and then uses remote_api to send the encoded data over. > This inevitably leads to less CPU utilization than parsing it yourself > on the server. Nevertheless, the main reason I recommended the Python > bulk loader is because it has support for concurrency and > rate-limiting built right in. > >> Concerning the speed of deleting existing data. You suggested >> using key-only queries. In my initial email that you responded >> to, I had a short code snippet where, in deed, I set the >> query to use keys only. So, was the code incorrect? > > Sorry, I didn't read the snippet in enough detail. > > -Nick Johnson > >> Richard Emberson >> >> >> Nick Johnson (Google) wrote: >>> Hi Richard, >>> >>> You're running into short term quotas, which are designed to prevent >>> you exhausting your entire quota for the day in one go. You need to >>> rate-limit your bulk loading code, and/or pay for additional quota. >>> Even enabling billing without setting a high limit will increase your >>> short term quotas automatically. >>> >>> You should also look at your bulk loading code and make sure it's as >>> efficient as possible. One possibility is to use the Python >>> bulkloader. >>> >>> As far as deletion goes, make sure you are doing key-only queries to >>> get the key to delete, which will save on CPU time and timeouts. >>> >>> -Nick Johnson >>> >>> On Wed, Jul 15, 2009 at 12:11 AM, richard >>> emberson<[email protected]> wrote: >>>> So, once again, I've tried to upload some data. >>>> >>>> After a couple, I guess, thousand records I start >>>> getting HttpServletResponse.SC_FORBIDDEN from >>>> the app engine server. >>>> >>>> On the Dashboard it says: >>>> >>>> Your application is exceeding a quota: CPU Time >>>> Your application is exceeding a quota: Datastore CPU Time >>>> >>>> but under Resource, CPU Time usage is at 34% >>>> and Stored Data usage is at 4%. >>>> >>>> I am trying to develop an application on GAE. >>>> I will need to load tens of thousands or >>>> a couple of hundred thousand entities as part >>>> of testing the application. I will then want >>>> to delete those entities. >>>> >>>> Currently, I can only load a couple of hundred >>>> before the app engine starts rejecting additional >>>> uploads. And I can not delete any of them - I >>>> keep getting timeouts - even if I try to delete only >>>> 10. >>>> >>>> Is there some upload per minute quota or something? >>>> And, whats the magic to delete stuff. >>>> >>>> The following code causes timeouts: >>>> >>>> DatastoreService ds = DatastoreServiceFactory.getDatastoreService(); >>>> final Query q = new Query(kindName); >>>> q.setKeysOnly(); >>>> >>>> final Iterable<Entity> entities = ds.prepare(q).asIterable( >>>> FetchOptions.Builder.withLimit(count)); >>>> KeyIterable ki = new KeyIterable(entities); >>>> ds.delete(ki); >>>> int numberDeleted = ki.getCount(); >>>> return numberDeleted; >>>> >>>> >>>> >>>> Richard >>>> >>>> -- >>>> Quis custodiet ipsos custodes >>>> >>> >>> >> -- >> Quis custodiet ipsos custodes >> > > > -- Quis custodiet ipsos custodes --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en -~----------~----~----~----~------~----~------~--~---
