Great work everyone. Don't worry about the sporadic updates, that is just the nature of the beast when working through stuff like this. Well done so far...
*Will STEVENS* Lead Developer *CloudOps* *| *Cloud Solutions Experts 420 rue Guy *|* Montreal *|* Quebec *|* H3J 1S6 w cloudops.com *|* tw @CloudOps_ On Fri, Oct 7, 2016 at 9:53 AM, John Burwell <john.burw...@shapeblue.com> wrote: > All, > > Thank you Ilya and Haijao for your words of encouragement. In addition to > the efforts of Paul, Rohit, Murali, Abhi, and Bobby, Sergey Levitskiy has > been providing great help testing VMware. > > I apologize for my sporadic status updates. We have made significant > progress in getting smoke tests to pass on KVM, XenServer, and VMware. > Currently, we have the following number of failures and errors: > > * KVM: 0 > * VMware: 4 > * XenServer: 8 > > The outstanding failures and errors seem to be the caused by the following > issues: > > 1. On VMware and XenServer, guest VMs in VPCs start but don’t > acquire IP addresses causing tests relying on SSH connectivity tests to > fail. The issue occurs does not occur on KVM, intermittently on VMware, > and consistently on XenServer. This issue affects the test_vpc_redundant, > test_privategw_acl, and test_vpc_vpn test suites. We believe that this > issue may be caused by either the guest VMs startup/DHCP wait period > winning the race with the VPC VR configuration or there is a problem on the > VPC VR assigning IP addresses. We are currently investigating and expect > to identify the root cause shortly. > 2. SSVM downloads str being restarted due to ping timeouts on > XenServer and VMware. We are seeing the following messages such as the > following in the Management Server logs: > > com.cloud.utils.exception.CloudRuntimeException: Failed > to send command, due to > Agent:5,com.cloud.exception.OperationTimedoutException: > Commands > 9042102151853113352 to Host 5 timed out after 2400 > > Our initial investigation discovered different timezones being > used by the system VM templates and Management Server. This discrepancy We > have modified Trillian to ensure consistent configuration of time zones > across a cluster, and are preparing another run for XenServer and VMware. > KVM is not affected by this time zone issue because KVM hosts use the same > CentOS template as CentOS based Management Servers -- creating time zone > consistency by side effect. > > Reports of each test run are available on PR #1692 [1]. We have kicked a > new round of tests on KVM, VMware, and XenServer with the time zone fix and > additional instrumentation to run down the VPC VR race condition. > > Instead of directly forward merging these changes, we plan to open a PR > for each forward merge. Since we are very close to having 4.8 resolved, > Rohit has open PR 1703 [2] for the 4.9 forward merge and kicked off a test > run. While we cannot close this PR until 1692 is complete, we are hoping > to get a head start on any issues in the 4.9 branch. > > Thank you again for your patience, > -John > > [1]: https://github.com/apache/cloudstack/pull/1692 > [2]: https://github.com/apache/cloudstack/pull/1703 > > > On Oct 5, 2016, at 4:32 AM, Haijiao <18602198...@163.com> wrote: > > > > Though I am one of the silent majority, I would thank John the dev team > for your continuous effort, you keep ACS alive and better ! > > > > > > Just heard one of biggest finance company in China running 10,000+ VMs > on ACS 4.4 for production/dev/QAS, you guys should be proud of that. > > Salute to you! > > > > > > > > > > > > > > > > 在2016年10月05 03时09分, "ilya"<ilya.mailing.li...@gmail.com>写道: > > > > John and Team > > > > Thanks for amazing work and contributing back. > > > > Regards, > > ilya > > > > On 10/3/16 9:48 PM, John Burwell wrote: > >> All, > >> > >> A quick update on our progress to pass all smoke tests aka super > green. We have reduced the failures and errors for XenServer from 93 to 9 > and for VMware from 51 to 14. A CentOS 6/CentOS 6 KVM run is currently > executing. Based on manual tests/fixes, we are expecting to be the first > super green configuration. We have also found the following additional > defects: > >> > >> * CLOUDSTACK-9528 [2]: SSVM Downloads (built-in) Template Multiple > Times > >> * CLOUDSTACK-9529 [3]: Marvin Tests Do Not Clean Up Properly > >> > >> 9528 is causing XenServer environments to fail to install and startup > cleanly. A lack of cleanup described in 9529 is causing XenServer to > exhaust available resources before a test run completes. We believe that > resolution of these issues will address most, if not all, of the XenServer > issues. > >> > >> Thanks, > >> -John > >> > >> [1]: https://cwiki.apache.org/confluence/pages/viewpage. > action?pageId=65873020 > >> [2]: https://issues.apache.org/jira/browse/CLOUDSTACK-9528 > >> [3]: https://issues.apache.org/jira/browse/CLOUDSTACK-9529 > >> > >>> > >> john.burw...@shapeblue.com > >> www.shapeblue.com > >> 53 Chandos Place, Covent Garden, London VA WC2N 4HSUK > >> @shapeblue > >> > >> > >> > >> On Sep 30, 2016, at 2:40 AM, John Burwell <john.burw...@shapeblue.com> > wrote: > >>> > >>> All, > >>> > >>> Using blueorganutan, Rohit, Murali, Boris, Paul, Abhi, and I are > executing the smoke tests for the 4.8, 4.9, and master branches against the > following environments: > >>> > >>> * CentOS 7.2 Management Server + VMware 5.5u3 + NFS > Primary/Secondary Storage > >>> * CentOS 7.2 Management Server + XenServer 6.5SP1 + NFS > Primary/Secondary Storage > >>> * CentOS 7.2 Management Server + CentOS 7.2 KVM + NFS > Primary/Secondary Storage > >>> > >>> Thus far, we have found seven (7) test case and/or CloudStack defects > in the VMware run for the 4.8 branch [1]. We are currently triaging > fifty-one (51) new issues from the XenServer run to determine which issues > were environmental and defects. This triage work should be completed today > (30 Sept 2016). Finally, we are awaiting the results of the KVM run. > >>> > >>> We are using PR #1692 [2] as the master tracking PR to fix all defects > in the 4.8 branch. Our goal is to get all non-skip tests to pass and then > merge this PR to the 4.8, 4.9, and master. For each bug, we are creating a > JIRA ticket and adding a commit to the PR. Currently, the branch for this > PR is in the shapeblue repo (the branch started with a much smaller fix > from Paul and we just kept using it). However, if others are interested in > picking up defects, we will move it to ASF repo. Once the 4.8 branch is > stabilized, we plan to re-execute these tests on the 4.9 and master > branches as we expect that the 4.9 and master branches will have additional > issues. > >>> > >>> Since we are in a test freeze, I propose that no further PRs are > merged to the 4.8, 4.9, and master branches until they are stabilized. The > following PRs will be re-based, re-tested, and merged to 4.8, 4.9.1.0, > and/or 4.10.0.0 post-stabilization: > >>> > >>> * 1696 > >>> * 1694 > >>> * 1684 > >>> * 1681 > >>> * 1680 > >>> * 1678 > >>> * 1677 > >>> * 1676 > >>> * 1674 > >>> * 1673 > >>> * 1642 > >>> * 1624 > >>> * 1615 > >>> * 1600 > >>> * 1545 > >>> * 1542 > >>> > >>> I recognize that this a large backlog of contributions ready to merge, > and apologize for asking folks to wait. However, given current state of > the release branches, merging them before we complete fixing the smoke > tests would create a moving target that further delay stabilization. > >>> > >>> Obviously, it is unlikely we will make the 10 October 2016 release > date for the 4.8.2.0, 4.9.1.0, and 4.10.0.0 releases. At this point, it is > difficult to estimate the size of the schedule slip because we still have > issues to triage and test runs to complete. I have created a wiki page [2] > to track progress on this effort. > >>> > >>> Does this approach sound reasonable? Any suggestions to speed up this > process will be greatly appreciated as stabilizing and re-opening these > branches stable ASAP is critical for the community. > >>> > >>> Thanks, > >>> -John > >>> > >>> [1]: https://issues.apache.org/jira/browse/CLOUDSTACK-9518? > jql=project%20%3D%20CLOUDSTACK%20AND%20fixVersion%20in%20(4.8.2.0)% > 20AND%20labels%20in%20(4.8.2.0-smoke-test-failure) > >>> [2]: https://cwiki.apache.org/confluence/pages/viewpage. > action?pageId=65873020 > >>> > >>>> On Sep 26, 2016, at 8:38 AM, Will Stevens <wstev...@cloudops.com> > wrote: > >>>> > >>>> Yes, I think it is important that you or Rajani sign off on anything > that > >>>> gets in while branches are frozen so you guys can stay on top of what > goes > >>>> in. > >>>> > >>>> Thanks for all the hard work team. :) > >>>> > >>>> *Will STEVENS* > >>>> Lead Developer > >>>> > >>>> *CloudOps* *| *Cloud Solutions Experts > >>>> 420 rue Guy *|* Montreal *|* Quebec *|* H3J 1S6 > >>>> w cloudops.com *|* tw @CloudOps_ > >>>> > >>>> On Mon, Sep 26, 2016 at 2:10 AM, John Burwell < > john.burw...@shapeblue.com> > >>>> wrote: > >>>> > >>>>> All, > >>>>> > >>>>> Per our release schedule [1], the 4.8, 4.9, and master branches are > frozen > >>>>> for testing. There are some straggling PRs that Rajani and I are > working > >>>>> to merge. Is it acceptable to everyone that for the next two (2) > weeks, > >>>>> all PRs require not only 2 LGTMs, but approval by Rajani or I to be > merged > >>>>> to these branches? To be clear, we don’t have to perform the merges, > >>>>> simply give a thumbs up. > >>>>> > >>>>> Thanks, > >>>>> -John > >>>>> john.burw...@shapeblue.com > >>>>> www.shapeblue.com > >>>>> 53 Chandos Place, Covent Garden, London VA WC2N 4HSUK > >>>>> @shapeblue > >>>>> > >>>>> > >>>>> > >>>>> > >>> > >>> > >>> john.burw...@shapeblue.com > >>> www.shapeblue.com > >>> 53 Chandos Place, Covent Garden, London VA WC2N 4HSUK > >>> @shapeblue > >>> > >>> > >>> > >> > > > john.burw...@shapeblue.com > www.shapeblue.com > 53 Chandos Place, Covent Garden, London VA WC2N 4HSUK > @shapeblue > > > >