2 thoughts:

 - remember that when servicing multiple simultaneous requests the CPU is 
switching between them.  I bet the pauses are (in part) due to waiting for 
a turn on the CPU - your request got "paused" while the memcache call was 
made, and then had to wait for CPU once the memcache response was ready.
 - based on the description of what the pages are doing, have you tried 
using "Edge Cache"?  it's very under-documented, but when the state of a 
page is the same for many users it's a big $$$ saver.  see 
https://code.google.com/p/googleappengine/issues/detail?id=2258 and 
https://cloud.google.com/appengine/docs/python/how-requests-are-handled#cache-control_expires_and_vary

good luck!

cfh

On Tuesday, July 5, 2016 at 11:24:46 AM UTC-7, Nick (Cloud Platform 
Support) wrote:
>
> Hey Folks,
>
> As for the amount of concurrent requests an instance can handle, it 
> depends on the CPU usage on the instance, and the number of concurrent 
> requests the instance can handle is dependent on that, and the distribution 
> of requests to instances is also dependent on statistical trends in 
> latency. It's possible to see variable concurrent request performance 
> dependent on how requests use up the resources for the given instance 
> class 
> <https://cloud.google.com/appengine/docs/about-the-standard-environment?authuser=0#instance_classes>
>  
> and the latency statistics of requests on an instance.
>
> I have one small recommendation relating to the mysterious gaps of time in 
> requests. Using System.currentTimeMillis() 
> <https://docs.oracle.com/javase/7/docs/api/java/lang/System.html#currentTimeMillis()>
>  
> calls (this is for java, but other runtimes have equivalent system calls) 
> to surround calls to complex libraries, or any calls which require network 
> activity, and you could be able to determine what exactly is taking up that 
> time. It might be something optimize-able, or it might be a network issue. 
> Depending on the nature of the network call itself, it could also be 
> optimize-able.
>
> Regards,
>
> Nick
> Cloud Platform Community Support
>
> On Friday, July 1, 2016 at 3:28:51 AM UTC-4, Thomas Taschauer wrote:
>>
>> One thing I noticed is that the first request(s?) served by a fresh 
>> instance will always be really slow. Not that they stay in the request 
>> queue for a longer time (which is expected behaviour of course), but they 
>> have long "pauses" in the middle of the request as you mentioned before, 
>> usually up to 5 seconds in my case.
>>
>> What I'm going to test next is upgrading to F2 - hoping for smaller 
>> pauses due to a faster CPU - and reverting other scaling-options to default 
>> (used max_concurrent_requests and max_idle_instances before) hoping for the 
>> AppEngine scaler to figure it out himself. :)
>>
>> On Thursday, June 30, 2016 at 1:13:42 PM UTC+2, troberti wrote:
>>>
>>> Great to hear that it helps. Actually, if you are using F4s, I might try 
>>> a slightly higher max_concurrent_requests , say 4. Again, test and compare 
>>> to be sure.
>>>
>>> Finally, to reduce costs, I would recommend to set max_idle_instances to 
>>> 1. Keep min_idle_instances to what you need for your application. For us 
>>> this reduces cost significantly without any apparent drawbacks.
>>>
>>> On Thursday, June 30, 2016 at 11:44:34 AM UTC+2, Trevor wrote:
>>>>
>>>> Well, I have to say thank you very, very much. Thanks to your advice we 
>>>> have our lowest latency in 3 years! Sub 300ms average.  As expected 
>>>> though, 
>>>> we are now sitting on 21 billed f4 instances, which will potentially cost 
>>>> us in the order of 3x our current ($30-40 -> $100+), but we will tweak 
>>>> that 
>>>> from tomorrow onwards. Peak hour is about to hit so we are going to see if 
>>>> the system can keep sub-300ms at the current "automatic" setting for 
>>>> scaling. But yes, once again, thank you for solving in 5 minutes what I 
>>>> have been working on doing for 2 weeks (my tears are from joy and sadness 
>>>> all at once)
>>>>
>>>>
>>>> <https://lh3.googleusercontent.com/-eEUuw3hSLYU/V3Tox-bhe6I/AAAAAAAAQWM/zPzgBRJkRHcoBSPmVrP2xsmN2FDK6Yl_wCLcB/s1600/Screen%2BShot%2B2016-06-30%2Bat%2B18.37.20.png>
>>>>
>>>>
>>>> <https://lh3.googleusercontent.com/-4c7xvBsQ_tk/V3TpBfSBUWI/AAAAAAAAQWU/0tgD4v43X44D5Q-gBULBeQu11KIApRPYQCLcB/s1600/Screen%2BShot%2B2016-06-30%2Bat%2B18.39.51.png>
>>>>
>>>>
>>>> On Thursday, June 30, 2016 at 6:03:23 PM UTC+9, troberti wrote:
>>>>>
>>>>> Right, you should definitely test and see what the results are. My 
>>>>> first inclination was also to increase max_concurrent_requests, but 
>>>>> because 
>>>>> then all those requests have increased latency, the actual QPS per 
>>>>> instance 
>>>>> decreased! Lowering max_concurrent_requests decreased request latency, so 
>>>>> each instance could process more requests/second.
>>>>>
>>>>> We use F1 instances, because we do not need the additional memory, and 
>>>>> our requests perform mostly RPCs. In our testing, faster instance classes 
>>>>> do process requests faster, but also cost significantly more.  F1s 
>>>>> provide 
>>>>> the best performance/cost ratio for us. This could be a Python thing, not 
>>>>> sure. Again, you should really test and figure out what is the best for 
>>>>> your application+runtime.
>>>>>
>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-appengine/bd10f48f-b327-47c1-bf6c-0a56aa6a7245%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to