This is the 2nd time I've experienced this issue in the last 4 months that I'm using heroku in production!
On Apr 21, 1:13 pm, Jeff Schmitz <[email protected]> wrote: > Latest: > > I suppose Heroku is in the unavailability zone that is still down. Sorry, > Freudian slip. > 12:30 PM PDT We have observed successful new launches of EBS backed > instances for the past 15 minutes in all but one of the availability zones > in the US-EAST-1 Region. The team is continuing to work to recover the > unavailable EBS volumes as quickly as possible. > > > > > > > > On Thu, Apr 21, 2011 at 1:45 PM, John Norman <[email protected]> wrote: > > Here's what you want -- from:http://status.aws.amazon.com/ > > > The last three provide the most information. > > > 1:41 AM PDT We are currently investigating latency and error rates with > > EBS volumes and connectivity issues reaching EC2 instances in the US-EAST-1 > > region. > > > 2:18 AM PDT We can confirm connectivity errors impacting EC2 instances and > > increased latencies impacting EBS volumes in multiple availability zones in > > the US-EAST-1 region. Increased error rates are affecting EBS CreateVolume > > API calls. We continue to work towards resolution. > > > 2:49 AM PDT We are continuing to see connectivity errors impacting EC2 > > instances, increased latencies impacting EBS volumes in multiple > > availability zones in the US-EAST-1 region, and increased error rates > > affecting EBS CreateVolume API calls. We are also experiencing delayed > > launches for EBS backed EC2 instances in affected availability zones in the > > US-EAST-1 region. We continue to work towards resolution. > > > 3:20 AM PDT Delayed EC2 instance launches and EBS API error rates are > > recovering. We're continuing to work towards full resolution. > > > 4:09 AM PDT EBS volume latency and API errors have recovered in one of the > > two impacted Availability Zones in US-EAST-1. We are continuing to work to > > resolve the issues in the second impacted Availability Zone. The errors, > > which started at 12:55AM PDT, began recovering at 2:55am PDT > > > 5:02 AM PDT Latency has recovered for a portion of the impacted EBS > > volumes. We are continuing to work to resolve the remaining issues with EBS > > volume latency and error rates in a single Availability Zone. > > > 6:09 AM PDT EBS API errors and volume latencies in the affected > > availability zone remain. We are continuing to work towards resolution. > > > 6:59 AM PDT There has been a moderate increase in error rates for > > CreateVolume. This may impact the launch of new EBS-backed EC2 instances in > > multiple availability zones in the US-EAST-1 region. Launches of instance > > store AMIs are currently unaffected. We are continuing to work on resolving > > this issue. > > > 7:40 AM PDT In addition to the EBS volume latencies, EBS-backed instances > > in the US-EAST-1 region are failing at a high rate. This is due to a high > > error rate for creating new volumes in this region. > > > 8:54 AM PDT We'd like to provide additional color on what were working on > > right now (please note that we always know more and understand issues better > > after we fully recover and dive deep into the post mortem). A networking > > event early this morning triggered a large amount of re-mirroring of EBS > > volumes in US-EAST-1. This re-mirroring created a shortage of capacity in > > one of the US-EAST-1 Availability Zones, which impacted new EBS volume > > creation as well as the pace with which we could re-mirror and recover > > affected EBS volumes. Additionally, one of our internal control planes for > > EBS has become inundated such that it's difficult to create new EBS volumes > > and EBS backed instances. We are working as quickly as possible to add > > capacity to that one Availability Zone to speed up the re-mirroring, and > > working to restore the control plane issue. We're starting to see progress > > on these efforts, but are not there yet. We will continue to provide updates > > when we have them. > > > 10:26 AM PDT We have made significant progress in stabilizing the affected > > EBS control plane service. EC2 API calls that do not involve EBS resources > > in the affected Availability Zone are now seeing significantly reduced > > failures and latency and are continuing to recover. We have also brought > > additional capacity online in the affected Availability Zone and stuck EBS > > volumes (those that were being remirrored) are beginning to recover. We > > cannot yet estimate when these volumes will be completely recovered, but we > > will provide an estimate as soon as we have sufficient data to estimate the > > recovery. We have all available resources working to restore full service > > functionality as soon as possible. We will continue to provide updates when > > we have them. > > > 11:09 AM PDT A number of people have asked us for an ETA on when we'll be > > fully recovered. We deeply understand why this is important and promise to > > share this information as soon as we have an estimate that we believe is > > close to accurate. Our high-level ballpark right now is that the ETA is a > > few hours. We can assure you that all-hands are on deck to recover as > > quickly as possible. We will update the community as we have more > > information. > > > On Thu, Apr 21, 2011 at 1:22 PM, Shannon Perkins < > > [email protected]> wrote: > > >> I'm a total lurker on this list, but I give a strong second to Eric's > >> comment. > > >> Whatever the technical explanation/root-cause turns out to be this is not > >> acceptable platform behavior. > > >> Very troubling. > > >> --sp > > >> On Thu, Apr 21, 2011 at 2:06 PM, Eric Anderson > >> <[email protected]>wrote: > > >>> On Apr 21, 11:50 am, Rohit Dewan <[email protected]> wrote: > >>> > Does anyone know why Heroku not able to redeploy onto another cluster? > >>> In > >>> > general, it would seem prudent to spread applications across the > >>> various > >>> > clusters so all apps do not suffer an outage when a single cluster is > >>> > affected. > > >>> I agree completely. I was surprised to see that problems in just one > >>> of Amazons MANY data centers took Heroku down. Even their own website > >>> and their own support system are down. I thought the point of the > >>> cloud is to have your app stay up even if there are problems at one > >>> data center. > > >>> Eric > > >>> -- > >>> You received this message because you are subscribed to the Google Groups > >>> "Heroku" group. > >>> To post to this group, send email to [email protected]. > >>> To unsubscribe from this group, send email to > >>> [email protected]. > >>> For more options, visit this group at > >>>http://groups.google.com/group/heroku?hl=en. > > >> -- > >> Shannon Perkins > >> Editor of Interactive News Technologies > >> Wired.com > >>415-276-4914 > >> --_--_- > > >> -- > >> You received this message because you are subscribed to the Google Groups > >> "Heroku" group. > >> To post to this group, send email to [email protected]. > >> To unsubscribe from this group, send email to > >> [email protected]. > >> For more options, visit this group at > >>http://groups.google.com/group/heroku?hl=en. > > > -- > > You received this message because you are subscribed to the Google Groups > > "Heroku" group. > > To post to this group, send email to [email protected]. > > To unsubscribe from this group, send email to > > [email protected]. > > For more options, visit this group at > >http://groups.google.com/group/heroku?hl=en. -- You received this message because you are subscribed to the Google Groups "Heroku" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/heroku?hl=en.
