Re: App Timeouts

Subramanya Sastry Sun, 07 Nov 2010 18:35:01 -0800

Aha!  Thanks for the explanation.  That is very helpful.  So, it could
just be a single bad request that pretty much times out all additional
requests down the pipe.  We've added rack-timeout already, and next
time we hit one such bad request, we'll know with an exceptional
report!


Subbu.

On Sun, Nov 7, 2010 at 8:27 PM, daniel hoey <[email protected]> wrote:
> Just to follow up on my original post: We had one action that we knew
> had a timeout problem but we hadn't prioritized fixing it. We
> eventually discovered that this action caused other requests to
> timeout. The understanding that I got from talking to Heroku support
> is when a request comes in it gets assigned to a dyno immediately. For
> the purposes of heroku timeouts the request 'start time' is now. But
> if that dyno is currently processing some other request then the new
> request will just wait. If 30 seconds passes and the first request has
> not finished processing, then both requests timeout. Note also that if
> the first request takes 29s and the second request takes 2s then the
> second request will timeout.
>
> We ended up putting SystemTimer (http://systemtimer.rubyforge.org/)
> timeouts around some of our actions and filters so an exception gets
> raised when something times out, rack-timeout looks like a better way
> of doing this. We also used New Relic Silver to find the actions that
> where the root cause of the problem.
>
> Basically the moral of the story is that you have to make sure that
> none of your actions ever timeout.
>
> On Nov 6, 4:31 am, Oren Teich <[email protected]> wrote:
>> I've seen a few people with weird timeouts where the app owner was
>> able to find out that it was a bug in their code.  Anything from a
>> weird SQL query locking a table that was hanging their process to API
>> requests to other hard to track stuff.
>>
>> This gem (http://github.com/kch/rack-timeout) will timeout your
>> requests after a period you specify.  The advantage of this is you can
>> set it to a short time, and exceptional/hoptoad should catch the
>> timeout giving you some indication in the backtrace of what's going
>> on.
>>
>> Oren
>>
>>
>>
>>
>>
>>
>>
>> On Fri, Nov 5, 2010 at 9:06 AM, Subbu Sastry <[email protected]> wrote:
>> > Has anyone found a reasonable solution to this problem yet?  On our
>> > app as well, we notice totally random timeout errors that couldn't
>> > possibly be associated with db lookup -- sometimes request time out on
>> > pages that lookup a row by primary key on a table with 15 records.
>> > Favicon.ico timed out as well.  The timeouts seem arbitrary, and
>> > *always* get fixed on server restart (heroku restart).  This has
>> > happened to us a few times over the last week.  And yes, as several of
>> > you have noted, there is no exceptions raised (neither exceptional nor
>> > NewRelic).
>>
>> > I think given that we experienced timeout with favicon.ico and an
>> > about page with a single db lookup and newrelic doesn't see this at
>> > all, I suspect this is something higher up the heroku stack that is
>> > timing out .. It almost smells like a memory leak somewhere which is
>> > how app restart seems to fix the problem.  Now, the question is
>> > whether the memory leak is in our app or somewhere else (plugins,
>> > gems, interaction with heroku stack) ... I will debug this, but wanted
>> > to see if someone else has found a reasonable solution to this.
>>
>> > Subbu.
>>
>> > On Oct 6, 9:37 pm, mattsly <[email protected]> wrote:
>> >> In just manual testing my app, I've seen a fair number of timeouts
>> >> (maybe a dozen) but have not received any communication.  I am pretty
>> >> sure I'd have no idea they occurred had I not personally witnessed the
>> >> error page.  I find this a borderline "ship blocker" for a migration
>> >> to Heroku as I consider migrating a ~500K monthly page view app to
>> >> Heroku, and get very anxious thinking about lots of users seeing funky
>> >> error page and having no way of being alerted or knowing how prevalent
>> >> the issue is.
>>
>> >> WRT to the timeouts, it's maybe 1% of requests thattimeout...and I
>> >> still can't pin down why they're happening.  I'm on a single dyno,
>> >> with Koi, and < 5 alpha testers on it "concurrently" (andtimeout
>> >> errors are related to response...not concurrency...) and these are
>> >> extremely simple paging requests, that according to New Relic, return
>> >> in ~100MS on average...and then all of a sudden...bam! - a 
>> >> requesttimeout.  And we're talking about essentially the exact same code
>> >> path, except a different :offset in the ActiveRecord find call.  The
>> >> complexity is nothing along the lines of suggestedtimeoutcauses
>> >> here:http://docs.heroku.com/performance#request-timeout
>>
>> >> Strangely, I just tried turning off all varnish level caching (which I
>> >> hope to rely on heavily) to try and isolate the issue and now perf
>> >> seems *more* consistent and faster (haven't seen a timout yet). Could
>> >> it be that the timeouts are being caused during lookup at the Varnish
>> >> layer? My understanding is this wouldn't be a possible explanation, as
>> >> I think the dyno doesn't even catch a request if the a varnish cache
>> >> hit is found.  So maybe Varnish caching is a red herring...but does
>> >> seem curious.
>>
>> >> Matt
>>
>> >> On Sep 24, 7:56 pm, John Norman <[email protected]> wrote:
>>
>> >> > Well, you should get an e-mail if your app is generating backlogs.
>>
>> >> > I have one app that did generate 2 in a whole week, and I received at 
>> >> > least
>> >> > two e-mails from Heroku suggesting that I up the number of dynos.
>>
>> >> > On Fri, Sep 24, 2010 at 11:42 AM, mattsly <[email protected]> wrote:
>> >> > > How are you finding the timeouts? Just manually?  I was havingtimeout
>> >> > > issues (that I now think I've solved - see below) but am concerned
>> >> > > that, once I flip my site public, that:
>>
>> >> > > a) There's no apparent native reporting/alerting for timeouts or
>> >> > > backlog too deep errors if they do occur
>> >> > > b) No ability to render a custom (static) error page in that case
>>
>> >> > > Re: reporting. When timeouts occur, am I mistaken in not seeing them
>> >> > > reported anywhere?  They don't seem to throw exceptional or new relic
>> >> > > exceptions with the free version?  It's unclear to me that they would
>> >> > > be with the (expensive - .$.05/hr = $36/month for alerting?) "Silver"
>> >> > > - can anyone confirm that they in fact do?
>>
>> >> > > It seems liketimeout/backlog too deep reporting/alerting should
>> >> > > really be a built-in feature of Heroku, since they are core elements
>> >> > > in the architecture, and such alerting (especially backlog) helps you
>> >> > > make a quick call about cranking dyno count up/down and or restarting
>> >> > > an app to minimize adverse user affects...i.e. really what this cloud
>> >> > > and hosting-as-a-service thing is all about.
>>
>> >> > > I'm about to (I think) migrate a high traffic site to Heroku. I *love*
>> >> > > the idea of being able to focus on development and not sysadmin...but
>> >> > > have to say that I am getting a little anxious about quirks like this
>> >> > > and what it might mean for my users.
>>
>> >> > > Matt
>>
>> >> > > (On a slightly related note - I've learned the hard way the
>> >> > > Table.count is a great way to cause atimeout- looks like MySQL and
>> >> > > PostGreSQL handle counts *way* differently...something to keep in mind
>> >> > > if you're migrating from mysql:
>> >> > >http://www.wikivs.com/wiki/MySQL_vs_PostgreSQL#COUNT.28.2A.29)
>>
>> >> > > On Sep 10, 3:45 am, daniel hoey <[email protected]> wrote:
>> >> > > > We go through short periods where we get frequentapptimeouts. The
>> >> > > > pages thattimeoutare often very simple and do not relying on
>> >> > > > external services or performing any demanding database queries. We
>> >> > > > don't get any information in our New Relic transaction traces for
>> >> > > > these queries (we have for othertimeoutsin the past). Basically we
>> >> > > > can't get any information about what is going on, and only know 
>> >> > > > about
>> >> > > > the problem if our users tell us. Has anyone else experienced 
>> >> > > > similar
>> >> > > > problems or have anything to suggest in terms of investigating the
>> >> > > > root cause?
>>
>> >> > > > The last time that we are aware of this happening was between 06:30
>> >> > > > and 07:00 GMT on Sept 10.
>>
>> >> > > On Sep 10, 3:45 am, daniel hoey <[email protected]> wrote:
>> >> > > > We go through short periods where we get frequentapptimeouts. The
>> >> > > > pages thattimeoutare often very simple and do not relying on
>> >> > > > external services or performing any demanding database queries. We
>> >> > > > don't get any information in our New Relic transaction traces for
>> >> > > > these queries (we have for othertimeoutsin the past). Basically we
>> >> > > > can't get any information about what is going on, and only know 
>> >> > > > about
>> >> > > > the problem if our users tell us. Has anyone else experienced 
>> >> > > > similar
>> >> > > > problems or have anything to suggest in terms of investigating the
>> >> > > > root cause?
>>
>> >> > > > The last time that we are aware of this happening was between 06:30
>> >> > > > and 07:00 GMT on Sept 10.
>>
>> >> > > --
>> >> > > You received this message because you are subscribed to the Google 
>> >> > > Groups
>> >> > > "Heroku" group.
>> >> > > To post to this group, send email to [email protected].
>> >> > > To unsubscribe from this group, send email to
>> >> > > [email protected]<heroku%[email protected]>
>> >> > > .
>> >> > > For more options, visit this group at
>> >> > >http://groups.google.com/group/heroku?hl=en.
>>
>> > --
>> > You received this message because you are subscribed to the Google Groups 
>> > "Heroku" group.
>> > To post to this group, send email to [email protected].
>> > To unsubscribe from this group, send email to 
>> > [email protected].
>> > For more options, visit this group 
>> > athttp://groups.google.com/group/heroku?hl=en.
>
> --
> You received this message because you are subscribed to the Google Groups 
> "Heroku" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/heroku?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Heroku" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/heroku?hl=en.

Re: App Timeouts

Reply via email to