On Wed, Jul 15, 2009 at 7:03 PM, richard
emberson<[email protected]> wrote:
>
> Nick,
>
> Again, thank you for your response.
>
> Two Items
>
> First Item:
> Is there some Java API to the short term quotas so the I can
> write code that "knows" how fast to go and when to stop?
> If not, are the upload short term rate limits published
> somewhere? Are they based on number of requests (which I can
> control), number of bytes stored per minute (which I can
> control), GAE cpu time (which I can not measure) or some
> other measure?

Not currently, no. Your best bet is to write your uploader to run
until it gets over quota errors, then back off a bit. The rates apply
to all the quotas that are accounted for on App Engine, and constitute
a fraction of your available daily quota for each. Increasing your
daily limits through billing will also increase the short-term
buckets. In some cases, simply enabling billing raises the limits.

>
>
> Second Item:
> Thanks for talking about the Python bulk loader. It gave me
> an idea. Just as I extract data from the protocol layer without
> creating any Entities, during a bulk load I can push data into
> the protocol layer without creating any Entities.
> As you know, an Entity only live in client application code and
> during a bulk load there is no need to create any of them.
> Each Entity has a map with String names and values. If you
> have tens of thousands of these it really consumes CPU and
> memory. So, just as I got a 20% performance improvement during
> extraction, I expect to get an performance improvements
> during loading.

This is exactly how remote_api works, which the Bulk Loader uses.

-Nick Johnson

>
> Regards,
>
> Richard
>
>
> Nick Johnson (Google) wrote:
>> On Wed, Jul 15, 2009 at 6:14 PM, richard
>> emberson<[email protected]> wrote:
>>> Nick,
>>>
>>> Thank you for the response.
>>>
>>> I have tens of thousands of records to load. If I load them
>>> all at once or "rate-limit" the load, wont I run out of the
>>> short term quotas just the same? Or did you mean that I
>>> ought to "rate-limit" my load over a number of days or weeks?
>>
>> The problem you're running into is loading so rapidly you're hitting
>> the very short term quotas intended to prevent you consuming all your
>> daily quota at once. If you rate-limit enough, you can avoid hitting
>> the short-term quotas while still staying within the daily quotas.
>> Whether or not you need to rate limit enough to cover more than one
>> day, or buy extra quota, is another issue.
>>
>>> I am trying to determine if with large datasets, GAE is an
>>> adequate platform onto which the application I have in mind
>>> can be hosted. Currently, I am doing an evaluation. I've
>>> not yet built the application because I want to know if
>>> GAE has adequate performance.
>>>
>>> I've have already re-written the client-side code that
>>> extracts the data from the protocol layer and achieved a
>>> 20% performance increase over the shipped 1.2.2 sdk on the
>>> production GAE server (my new code was only 12% to 15%
>>> faster on the
>>> local development server so 20% was unexpected).
>>> So, performance is critical for me - performance against
>>> large dataset.
>>>
>>> I don't know if the Python bulkloader will be an improvement.
>>> I ship the data up as csv blocks with are parsed into Entities
>>> and then stored. Pretty simple.
>>
>> The Python bulk loader does all the translation into entities on the
>> client side, and then uses remote_api to send the encoded data over.
>> This inevitably leads to less CPU utilization than parsing it yourself
>> on the server. Nevertheless, the main reason I recommended the Python
>> bulk loader is because it has support for concurrency and
>> rate-limiting built right in.
>>
>>> Concerning the speed of deleting existing data. You suggested
>>> using key-only queries. In my initial email that you responded
>>> to, I had a short code snippet where, in deed, I set the
>>> query to use keys only. So, was the code incorrect?
>>
>> Sorry, I didn't read the snippet in enough detail.
>>
>> -Nick Johnson
>>
>>> Richard Emberson
>>>
>>>
>>> Nick Johnson (Google) wrote:
>>>> Hi Richard,
>>>>
>>>> You're running into short term quotas, which are designed to prevent
>>>> you exhausting your entire quota for the day in one go. You need to
>>>> rate-limit your bulk loading code, and/or pay for additional quota.
>>>> Even enabling billing without setting a high limit will increase your
>>>> short term quotas automatically.
>>>>
>>>> You should also look at your bulk loading code and make sure it's as
>>>> efficient as possible. One possibility is to use the Python
>>>> bulkloader.
>>>>
>>>> As far as deletion goes, make sure you are doing key-only queries to
>>>> get the key to delete, which will save on CPU time and timeouts.
>>>>
>>>> -Nick Johnson
>>>>
>>>> On Wed, Jul 15, 2009 at 12:11 AM, richard
>>>> emberson<[email protected]> wrote:
>>>>> So, once again, I've tried to upload some data.
>>>>>
>>>>> After a couple, I guess, thousand records I start
>>>>> getting HttpServletResponse.SC_FORBIDDEN from
>>>>> the app engine server.
>>>>>
>>>>> On the Dashboard it says:
>>>>>
>>>>> Your application is exceeding a quota: CPU Time
>>>>> Your application is exceeding a quota: Datastore CPU Time
>>>>>
>>>>> but under Resource, CPU Time usage is at 34%
>>>>> and Stored Data usage is at 4%.
>>>>>
>>>>> I am trying to develop an application on GAE.
>>>>> I will need to load tens of thousands or
>>>>> a couple of hundred thousand entities as part
>>>>> of testing the application. I will then want
>>>>> to delete those entities.
>>>>>
>>>>> Currently, I can only load a couple of hundred
>>>>> before the app engine starts rejecting additional
>>>>> uploads. And I can not delete any of them - I
>>>>> keep getting timeouts - even if I try to delete only
>>>>> 10.
>>>>>
>>>>> Is there some upload per minute quota or something?
>>>>> And, whats the magic to delete stuff.
>>>>>
>>>>> The following code causes timeouts:
>>>>>
>>>>>     DatastoreService ds = DatastoreServiceFactory.getDatastoreService();
>>>>>     final Query q = new Query(kindName);
>>>>>     q.setKeysOnly();
>>>>>
>>>>>     final Iterable<Entity> entities = ds.prepare(q).asIterable(
>>>>>                 FetchOptions.Builder.withLimit(count));
>>>>>     KeyIterable ki = new KeyIterable(entities);
>>>>>     ds.delete(ki);
>>>>>     int numberDeleted = ki.getCount();
>>>>>     return numberDeleted;
>>>>>
>>>>>
>>>>>
>>>>> Richard
>>>>>
>>>>> --
>>>>> Quis custodiet ipsos custodes
>>>>>
>>>>
>>>>
>>> --
>>> Quis custodiet ipsos custodes
>>>
>>
>>
>>
>
> --
> Quis custodiet ipsos custodes
>
> >
>



-- 
Nick Johnson, App Engine Developer Programs Engineer
Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration
Number: 368047

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to