Entities are added at a slow pace: around 500 or so new entities a day, and evenly distributed though out the day. So I don't think that's the issue.
Updates, on the other hand are much more often: around 45/second. I update the "eta" property to the time at which this entity needs to be processed again (each entity represents a blog feed that my system pulls often). And I have tasks that pull the entities that have an eta < now and process them. The task pulls 50 entities at a time and then inserts a new task to continue the work (a chain of tasks). The first task runs the query I mentioned earlier, and then it passes the cursor to the next. Only the query of the first time is timing out, but once that works, the following tasks that use the cursor work without problems. Entities use a key_name that is a hash of the feed URL, so they're evenly distributed on disk. The index on the "eta" column, on the other hand, is probably not evenly distributed on disk. However, if the problem was due to a hot tablet, then I'd expect the issue to happen while updating the "eta" value while processing each entity. But that doesn't happen. All updates work without problems. When the first task in the chain runs the query mentioned earlier and it times out, then it doesn't insert the next task. And that means once that first task fails, the whole system stops. The task gets retried until it succeeds, which might take 20+ attempts. And due to the exponential back-off of the task queue, that usually takes hours. During that time, the app has almost no activity. So the interesting thing is that I'm getting these time-outs on this specific table (and no other tables) and I'm getting it when trying to read (but not on write), and only when I don't pass a cursor, and it happens even when there is no load on the app. Also, the app has been running like that for over a year, and this started just recently. As far as I can tell, it's a datastore bug. I hope to be proven wrong, though. Waleed On Thu, May 26, 2011 at 10:27 PM, Robert Kluin <[email protected]>wrote: > I had the same thought as Stephen about the tablet splitting, but that > wouldn't last for hours and hours unless your adding new data at a > very high rate durring that time. Also, I'd expect the datastore > viewer to not work correctly if your in code queries were failing > because of that. > > How do you get new data into the system? How many entities are you > trying to fetch in a batch? What kind of changes are you making to > these entities? When this problem is happening, is it just the one > (query) task that is impacted or are other parts of your app impacted > as well? If you insert a new version of that task does it run even > though the other one keeps failing? > > > > Robert > > > > > On Thu, May 26, 2011 at 03:50, Waleed Abdulla <[email protected]> wrote: > > Thanks Stephen. Good point about the possibility of background splitting. > > But then again, the app has been running for a year without problems, and > > suddenly last week that query started to timeout. I didn't do any app > > updates recently to cause this. > > And when the query times-out, it tends to keep timing out again and again > > for hours. So even if there is a background data re-organization > happening, > > it shouldn't keep the table unusable for hours like that. There must be > > another explanation. > > Waleed > > > > > > > > On Wed, May 25, 2011 at 2:43 PM, Stephen <[email protected]> > wrote: > >> > >> On Wed, May 25, 2011 at 8:09 PM, Waleed Abdulla <[email protected]> > wrote: > >> > Stephen, > >> > I don't see how your suggestion would help! Can you please > elaborate > >> > on > >> > how it's related? > >> > >> This doesn't apply if you're not deleting, but deleted entities (and > >> index entries) aren't deleted immediately but marked deleted and > >> purged later. The dead index entries must be skipped over in queries > >> before locating live entries. > >> > >> > Also, I'm not deleting any entities. I'm just updating > >> > them. And when the query is timing out, it does so even when there is > no > >> > load on the app. > >> > >> So perhaps a high rate of inserts/updates on your monotonically > >> increasing eta index is overloading a tablet server and causing > >> frequent splitting? I guess it might not always correspond directly > >> with traffic to the app as the datastore schedules the rearranging. > >> > >> If you do have a high update rate, maybe try to aggressively batch > >> them into large transactions? > >> > >> -- > >> You received this message because you are subscribed to the Google > Groups > >> "Google App Engine" group. > >> To post to this group, send email to [email protected]. > >> To unsubscribe from this group, send email to > >> [email protected]. > >> For more options, visit this group at > >> http://groups.google.com/group/google-appengine?hl=en. > >> > > > > -- > > You received this message because you are subscribed to the Google Groups > > "Google App Engine" group. > > To post to this group, send email to [email protected]. > > To unsubscribe from this group, send email to > > [email protected]. > > For more options, visit this group at > > http://groups.google.com/group/google-appengine?hl=en. > > > > -- > You received this message because you are subscribed to the Google Groups > "Google App Engine" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/google-appengine?hl=en. > > -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
