Yeah, we are also looking forward for HR but we need a major redesign of some components to lower the CPU usage (aim to pay less actually)
I have one HR instance already with a small server that does random inserts - think im going to leave jmeter running for 2 days and compare the graphs On 21 February 2011 16:35, Robert Kluin <[email protected]> wrote: > I would also like ideas / suggestions. > > I've seen similar error patterns on apps today. Also user-facing requests > on M/S datastore apps. I usually see large bursts of errors, and relatively > few 'intermittent' issues. > > For task-heavy processing I lower the RPC limit to reduce the impact caused > by latency spikes. For users-requests I've been trying to catch deadline, > timeout, and application (5) errors and defer the write via task queue. > > > Would love to hear other ideas -- besides just move to HR :) > > Could anyone using HR datastore post today's error graph for comparison? > > > Robert > > > > > > On Mon, Feb 21, 2011 at 16:31, Dmitry <[email protected]> wrote: > >> Hi All! >> >> I'm trying to figure out the reason of my datastore timeouts. I use >> master/slave datastore. >> >> As I can see possible reasons are: >> >> - contention issues >> - "A very small number of datastore operations – generally less than 1 >> in 3000 – will result in a timeout in normal operation" (as per >> documentation) >> >> >> In my case it is acceptable error rate for background task operations >> (which retry automatically). For example (today stats) 127.58K tasks caused >> 192 errors. It is possible some contention errors here. >> >> But for user operations sometimes I have very high error rate (89 requests >> failed from 1.7K with datastore timeout). >> >> - I'm pretty sure I'm not trying to update the same entity group in >> the same minute (not even second) >> - transaction is small: get by key, put with updated value, run new >> task to update stats within transaction >> - I cannot retry operation within 30 seconds user request... I've >> added retry code - but It fails earlier >> - I cannot find any error patterns >> >> My questions are: >> >> 1. If this is a task queue issue - will transaction fail with >> datastore timeout? >> 2. Has High Replication datastore any difference in "normal" error >> rate (1 of 3000 for master/slave)? >> >> Just for the test I've created 2 applications (master/slave and HR) and >> ran my code (~40K task requests and random user actions). But results are >> not so obvious >> - master/slave failed 1 time with datastore error >> - no errors with HR >> May be due to small amount of data (~200MB in my test, possible only 1 >> tablet used). In real system I have around 190GB now. >> >> The error distribution (today 21/02): >> >> >> <https://lh3.googleusercontent.com/_nXv-kmjg1BQ/TWLXXDQAwrI/AAAAAAAAQno/wK-1Kh2Z_Qw/gae_erors.png> >> >> The big issue that these errors are visible to the customers. Any >> suggestions? >> >> Thanks! >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Google App Engine" group. >> To post to this group, send email to [email protected]. >> To unsubscribe from this group, send email to >> [email protected]. >> For more options, visit this group at >> http://groups.google.com/group/google-appengine?hl=en. >> > > -- > You received this message because you are subscribed to the Google Groups > "Google App Engine" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/google-appengine?hl=en. > -- http://about.me/david.mora -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
