Yes, I don't think the writes are the problem either. The error happens on the SELECT, and it happens when there is no load. Also, it's not a code error because:
1. It's been working for a year without problems and only started breaking two weeks ago with no change on my side 2. Running the simple SELECT query from the Datastore Viewer times out as well. The hack that I came up with to create a second index seems to be helping (I described it in this thread a few emails ago). With the new index, the query works without problems some days and only times out about 30 to 40 times on other days (as opposed to timing out hundreds of times before the new hack). I don't know how long this will keep working, though. But here is the scary thought: if I didn't find that hack, then my app would be broken for two weeks now and no one from Google has commented or looked into it. On Fri, May 27, 2011 at 10:43 AM, Robert Kluin <[email protected]>wrote: > Hey Waleed, > I doubt 45writes / second would cause any issues. I've sustained > higher rates without problems, even with indexed datetimes. I'm sure > you've checked it carefully, but have you investigated the logic > surrounding the cursor? Maybe there is some subtle bug that is > resulting in a bug on the first run? > > Hopefully Ikai might be able to chime in a give some further > guidance. This seems to be a strange issue, I'd very much like to > know the resolution to this. > > > Robert > > > > > > On Fri, May 27, 2011 at 03:00, Waleed Abdulla <[email protected]> wrote: > > Entities are added at a slow pace: around 500 or so new entities a day, > and > > evenly distributed though out the day. So I don't think that's the issue. > > Updates, on the other hand are much more often: around 45/second. I > update > > the "eta" property to the time at which this entity needs to be processed > > again (each entity represents a blog feed that my system pulls often). > And I > > have tasks that pull the entities that have an eta < now and process > them. > > The task pulls 50 entities at a time and then inserts a new task to > continue > > the work (a chain of tasks). The first task runs the query I mentioned > > earlier, and then it passes the cursor to the next. Only the query of the > > first time is timing out, but once that works, the following tasks that > use > > the cursor work without problems. > > Entities use a key_name that is a hash of the feed URL, so they're evenly > > distributed on disk. The index on the "eta" column, on the other hand, is > > probably not evenly distributed on disk. However, if the problem was due > to > > a hot tablet, then I'd expect the issue to happen while updating the > "eta" > > value while processing each entity. But that doesn't happen. All updates > > work without problems. > > When the first task in the chain runs the query mentioned earlier and it > > times out, then it doesn't insert the next task. And that means once that > > first task fails, the whole system stops. The task gets retried until it > > succeeds, which might take 20+ attempts. And due to the exponential > back-off > > of the task queue, that usually takes hours. During that time, the app > has > > almost no activity. > > So the interesting thing is that I'm getting these time-outs on this > > specific table (and no other tables) and I'm getting it when trying to > read > > (but not on write), and only when I don't pass a cursor, and it happens > even > > when there is no load on the app. Also, the app has been running like > that > > for over a year, and this started just recently. > > As far as I can tell, it's a datastore bug. I hope to be proven wrong, > > though. > > Waleed > > > > > > > > > > On Thu, May 26, 2011 at 10:27 PM, Robert Kluin <[email protected]> > > wrote: > >> > >> I had the same thought as Stephen about the tablet splitting, but that > >> wouldn't last for hours and hours unless your adding new data at a > >> very high rate durring that time. Also, I'd expect the datastore > >> viewer to not work correctly if your in code queries were failing > >> because of that. > >> > >> How do you get new data into the system? How many entities are you > >> trying to fetch in a batch? What kind of changes are you making to > >> these entities? When this problem is happening, is it just the one > >> (query) task that is impacted or are other parts of your app impacted > >> as well? If you insert a new version of that task does it run even > >> though the other one keeps failing? > >> > >> > >> > >> Robert > >> > >> > >> > >> > >> On Thu, May 26, 2011 at 03:50, Waleed Abdulla <[email protected]> wrote: > >> > Thanks Stephen. Good point about the possibility of background > >> > splitting. > >> > But then again, the app has been running for a year without problems, > >> > and > >> > suddenly last week that query started to timeout. I didn't do any app > >> > updates recently to cause this. > >> > And when the query times-out, it tends to keep timing out again and > >> > again > >> > for hours. So even if there is a background data re-organization > >> > happening, > >> > it shouldn't keep the table unusable for hours like that. There must > be > >> > another explanation. > >> > Waleed > >> > > >> > > >> > > >> > On Wed, May 25, 2011 at 2:43 PM, Stephen <[email protected]> > >> > wrote: > >> >> > >> >> On Wed, May 25, 2011 at 8:09 PM, Waleed Abdulla <[email protected]> > >> >> wrote: > >> >> > Stephen, > >> >> > I don't see how your suggestion would help! Can you please > >> >> > elaborate > >> >> > on > >> >> > how it's related? > >> >> > >> >> This doesn't apply if you're not deleting, but deleted entities (and > >> >> index entries) aren't deleted immediately but marked deleted and > >> >> purged later. The dead index entries must be skipped over in queries > >> >> before locating live entries. > >> >> > >> >> > Also, I'm not deleting any entities. I'm just updating > >> >> > them. And when the query is timing out, it does so even when there > is > >> >> > no > >> >> > load on the app. > >> >> > >> >> So perhaps a high rate of inserts/updates on your monotonically > >> >> increasing eta index is overloading a tablet server and causing > >> >> frequent splitting? I guess it might not always correspond directly > >> >> with traffic to the app as the datastore schedules the rearranging. > >> >> > >> >> If you do have a high update rate, maybe try to aggressively batch > >> >> them into large transactions? > >> >> > >> >> -- > >> >> You received this message because you are subscribed to the Google > >> >> Groups > >> >> "Google App Engine" group. > >> >> To post to this group, send email to > [email protected]. > >> >> To unsubscribe from this group, send email to > >> >> [email protected]. > >> >> For more options, visit this group at > >> >> http://groups.google.com/group/google-appengine?hl=en. > >> >> > >> > > >> > -- > >> > You received this message because you are subscribed to the Google > >> > Groups > >> > "Google App Engine" group. > >> > To post to this group, send email to > [email protected]. > >> > To unsubscribe from this group, send email to > >> > [email protected]. > >> > For more options, visit this group at > >> > http://groups.google.com/group/google-appengine?hl=en. > >> > > >> > >> -- > >> You received this message because you are subscribed to the Google > Groups > >> "Google App Engine" group. > >> To post to this group, send email to [email protected]. > >> To unsubscribe from this group, send email to > >> [email protected]. > >> For more options, visit this group at > >> http://groups.google.com/group/google-appengine?hl=en. > >> > > > > -- > > You received this message because you are subscribed to the Google Groups > > "Google App Engine" group. > > To post to this group, send email to [email protected]. > > To unsubscribe from this group, send email to > > [email protected]. > > For more options, visit this group at > > http://groups.google.com/group/google-appengine?hl=en. > > > > -- > You received this message because you are subscribed to the Google Groups > "Google App Engine" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/google-appengine?hl=en. > > -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
