Yeah, we are also looking forward for HR but we need a major redesign of
some components to lower the CPU usage (aim to pay less actually)

I have one HR instance already with a small server that does random inserts
- think im going to leave jmeter running for 2 days and compare the graphs

On 21 February 2011 16:35, Robert Kluin <[email protected]> wrote:

> I would also like ideas / suggestions.
>
> I've seen similar error patterns on apps today.  Also user-facing requests
> on M/S datastore apps.  I usually see large bursts of errors, and relatively
> few 'intermittent' issues.
>
> For task-heavy processing I lower the RPC limit to reduce the impact caused
> by latency spikes.  For users-requests I've been trying to catch deadline,
> timeout, and application (5) errors and defer the write via task queue.
>
>
> Would love to hear other ideas -- besides just move to HR :)
>
> Could anyone using HR datastore post today's error graph for comparison?
>
>
> Robert
>
>
>
>
>
> On Mon, Feb 21, 2011 at 16:31, Dmitry <[email protected]> wrote:
>
>> Hi All!
>>
>> I'm trying to figure out the reason of my datastore timeouts. I use
>> master/slave datastore.
>>
>> As I can see possible reasons are:
>>
>>    -  contention issues
>>    - "A very small number of datastore operations – generally less than 1
>>    in 3000 – will result in a timeout in normal operation" (as per
>>    documentation)
>>
>>
>> In my case it is acceptable error rate for background task operations
>> (which retry automatically). For example (today stats) 127.58K tasks caused
>> 192 errors. It is possible some contention errors here.
>>
>> But for user operations sometimes I have very high error rate (89 requests
>> failed from 1.7K with datastore timeout).
>>
>>    - I'm pretty sure I'm not trying to update the same entity group in
>>    the same minute (not even second)
>>    - transaction is small: get by key, put with updated value, run new
>>    task to update stats within transaction
>>    - I cannot retry operation within 30 seconds user request... I've
>>    added retry code - but It fails earlier
>>    - I cannot find any error patterns
>>
>> My questions are:
>>
>>    1. If this is a task queue issue - will transaction fail with
>>    datastore timeout?
>>    2. Has High Replication datastore any difference in "normal" error
>>    rate (1 of 3000 for master/slave)?
>>
>> Just for the test I've created 2 applications (master/slave and HR) and
>> ran my code (~40K task requests and random user actions). But results are
>> not so obvious
>> - master/slave failed 1 time with datastore error
>> - no errors with HR
>> May be due to small amount of data (~200MB in my test, possible only 1
>> tablet used). In real system I have around 190GB now.
>>
>> The error distribution (today 21/02):
>>
>>
>> <https://lh3.googleusercontent.com/_nXv-kmjg1BQ/TWLXXDQAwrI/AAAAAAAAQno/wK-1Kh2Z_Qw/gae_erors.png>
>>
>> The big issue that these errors are visible to the customers. Any
>> suggestions?
>>
>> Thanks!
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Google App Engine" group.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to
>> [email protected].
>> For more options, visit this group at
>> http://groups.google.com/group/google-appengine?hl=en.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.
>



-- 
http://about.me/david.mora

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to