Aha! Thanks for the explanation. That is very helpful. So, it could just be a single bad request that pretty much times out all additional requests down the pipe. We've added rack-timeout already, and next time we hit one such bad request, we'll know with an exceptional report!
Subbu. On Sun, Nov 7, 2010 at 8:27 PM, daniel hoey <[email protected]> wrote: > Just to follow up on my original post: We had one action that we knew > had a timeout problem but we hadn't prioritized fixing it. We > eventually discovered that this action caused other requests to > timeout. The understanding that I got from talking to Heroku support > is when a request comes in it gets assigned to a dyno immediately. For > the purposes of heroku timeouts the request 'start time' is now. But > if that dyno is currently processing some other request then the new > request will just wait. If 30 seconds passes and the first request has > not finished processing, then both requests timeout. Note also that if > the first request takes 29s and the second request takes 2s then the > second request will timeout. > > We ended up putting SystemTimer (http://systemtimer.rubyforge.org/) > timeouts around some of our actions and filters so an exception gets > raised when something times out, rack-timeout looks like a better way > of doing this. We also used New Relic Silver to find the actions that > where the root cause of the problem. > > Basically the moral of the story is that you have to make sure that > none of your actions ever timeout. > > On Nov 6, 4:31 am, Oren Teich <[email protected]> wrote: >> I've seen a few people with weird timeouts where the app owner was >> able to find out that it was a bug in their code. Anything from a >> weird SQL query locking a table that was hanging their process to API >> requests to other hard to track stuff. >> >> This gem (http://github.com/kch/rack-timeout) will timeout your >> requests after a period you specify. The advantage of this is you can >> set it to a short time, and exceptional/hoptoad should catch the >> timeout giving you some indication in the backtrace of what's going >> on. >> >> Oren >> >> >> >> >> >> >> >> On Fri, Nov 5, 2010 at 9:06 AM, Subbu Sastry <[email protected]> wrote: >> > Has anyone found a reasonable solution to this problem yet? On our >> > app as well, we notice totally random timeout errors that couldn't >> > possibly be associated with db lookup -- sometimes request time out on >> > pages that lookup a row by primary key on a table with 15 records. >> > Favicon.ico timed out as well. The timeouts seem arbitrary, and >> > *always* get fixed on server restart (heroku restart). This has >> > happened to us a few times over the last week. And yes, as several of >> > you have noted, there is no exceptions raised (neither exceptional nor >> > NewRelic). >> >> > I think given that we experienced timeout with favicon.ico and an >> > about page with a single db lookup and newrelic doesn't see this at >> > all, I suspect this is something higher up the heroku stack that is >> > timing out .. It almost smells like a memory leak somewhere which is >> > how app restart seems to fix the problem. Now, the question is >> > whether the memory leak is in our app or somewhere else (plugins, >> > gems, interaction with heroku stack) ... I will debug this, but wanted >> > to see if someone else has found a reasonable solution to this. >> >> > Subbu. >> >> > On Oct 6, 9:37 pm, mattsly <[email protected]> wrote: >> >> In just manual testing my app, I've seen a fair number of timeouts >> >> (maybe a dozen) but have not received any communication. I am pretty >> >> sure I'd have no idea they occurred had I not personally witnessed the >> >> error page. I find this a borderline "ship blocker" for a migration >> >> to Heroku as I consider migrating a ~500K monthly page view app to >> >> Heroku, and get very anxious thinking about lots of users seeing funky >> >> error page and having no way of being alerted or knowing how prevalent >> >> the issue is. >> >> >> WRT to the timeouts, it's maybe 1% of requests thattimeout...and I >> >> still can't pin down why they're happening. I'm on a single dyno, >> >> with Koi, and < 5 alpha testers on it "concurrently" (andtimeout >> >> errors are related to response...not concurrency...) and these are >> >> extremely simple paging requests, that according to New Relic, return >> >> in ~100MS on average...and then all of a sudden...bam! - a >> >> requesttimeout. And we're talking about essentially the exact same code >> >> path, except a different :offset in the ActiveRecord find call. The >> >> complexity is nothing along the lines of suggestedtimeoutcauses >> >> here:http://docs.heroku.com/performance#request-timeout >> >> >> Strangely, I just tried turning off all varnish level caching (which I >> >> hope to rely on heavily) to try and isolate the issue and now perf >> >> seems *more* consistent and faster (haven't seen a timout yet). Could >> >> it be that the timeouts are being caused during lookup at the Varnish >> >> layer? My understanding is this wouldn't be a possible explanation, as >> >> I think the dyno doesn't even catch a request if the a varnish cache >> >> hit is found. So maybe Varnish caching is a red herring...but does >> >> seem curious. >> >> >> Matt >> >> >> On Sep 24, 7:56 pm, John Norman <[email protected]> wrote: >> >> >> > Well, you should get an e-mail if your app is generating backlogs. >> >> >> > I have one app that did generate 2 in a whole week, and I received at >> >> > least >> >> > two e-mails from Heroku suggesting that I up the number of dynos. >> >> >> > On Fri, Sep 24, 2010 at 11:42 AM, mattsly <[email protected]> wrote: >> >> > > How are you finding the timeouts? Just manually? I was havingtimeout >> >> > > issues (that I now think I've solved - see below) but am concerned >> >> > > that, once I flip my site public, that: >> >> >> > > a) There's no apparent native reporting/alerting for timeouts or >> >> > > backlog too deep errors if they do occur >> >> > > b) No ability to render a custom (static) error page in that case >> >> >> > > Re: reporting. When timeouts occur, am I mistaken in not seeing them >> >> > > reported anywhere? They don't seem to throw exceptional or new relic >> >> > > exceptions with the free version? It's unclear to me that they would >> >> > > be with the (expensive - .$.05/hr = $36/month for alerting?) "Silver" >> >> > > - can anyone confirm that they in fact do? >> >> >> > > It seems liketimeout/backlog too deep reporting/alerting should >> >> > > really be a built-in feature of Heroku, since they are core elements >> >> > > in the architecture, and such alerting (especially backlog) helps you >> >> > > make a quick call about cranking dyno count up/down and or restarting >> >> > > an app to minimize adverse user affects...i.e. really what this cloud >> >> > > and hosting-as-a-service thing is all about. >> >> >> > > I'm about to (I think) migrate a high traffic site to Heroku. I *love* >> >> > > the idea of being able to focus on development and not sysadmin...but >> >> > > have to say that I am getting a little anxious about quirks like this >> >> > > and what it might mean for my users. >> >> >> > > Matt >> >> >> > > (On a slightly related note - I've learned the hard way the >> >> > > Table.count is a great way to cause atimeout- looks like MySQL and >> >> > > PostGreSQL handle counts *way* differently...something to keep in mind >> >> > > if you're migrating from mysql: >> >> > >http://www.wikivs.com/wiki/MySQL_vs_PostgreSQL#COUNT.28.2A.29) >> >> >> > > On Sep 10, 3:45 am, daniel hoey <[email protected]> wrote: >> >> > > > We go through short periods where we get frequentapptimeouts. The >> >> > > > pages thattimeoutare often very simple and do not relying on >> >> > > > external services or performing any demanding database queries. We >> >> > > > don't get any information in our New Relic transaction traces for >> >> > > > these queries (we have for othertimeoutsin the past). Basically we >> >> > > > can't get any information about what is going on, and only know >> >> > > > about >> >> > > > the problem if our users tell us. Has anyone else experienced >> >> > > > similar >> >> > > > problems or have anything to suggest in terms of investigating the >> >> > > > root cause? >> >> >> > > > The last time that we are aware of this happening was between 06:30 >> >> > > > and 07:00 GMT on Sept 10. >> >> >> > > On Sep 10, 3:45 am, daniel hoey <[email protected]> wrote: >> >> > > > We go through short periods where we get frequentapptimeouts. The >> >> > > > pages thattimeoutare often very simple and do not relying on >> >> > > > external services or performing any demanding database queries. We >> >> > > > don't get any information in our New Relic transaction traces for >> >> > > > these queries (we have for othertimeoutsin the past). Basically we >> >> > > > can't get any information about what is going on, and only know >> >> > > > about >> >> > > > the problem if our users tell us. Has anyone else experienced >> >> > > > similar >> >> > > > problems or have anything to suggest in terms of investigating the >> >> > > > root cause? >> >> >> > > > The last time that we are aware of this happening was between 06:30 >> >> > > > and 07:00 GMT on Sept 10. >> >> >> > > -- >> >> > > You received this message because you are subscribed to the Google >> >> > > Groups >> >> > > "Heroku" group. >> >> > > To post to this group, send email to [email protected]. >> >> > > To unsubscribe from this group, send email to >> >> > > [email protected]<heroku%[email protected]> >> >> > > . >> >> > > For more options, visit this group at >> >> > >http://groups.google.com/group/heroku?hl=en. >> >> > -- >> > You received this message because you are subscribed to the Google Groups >> > "Heroku" group. >> > To post to this group, send email to [email protected]. >> > To unsubscribe from this group, send email to >> > [email protected]. >> > For more options, visit this group >> > athttp://groups.google.com/group/heroku?hl=en. > > -- > You received this message because you are subscribed to the Google Groups > "Heroku" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/heroku?hl=en. > > -- You received this message because you are subscribed to the Google Groups "Heroku" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/heroku?hl=en.
