All, Thank you Ilya and Haijao for your words of encouragement. In addition to the efforts of Paul, Rohit, Murali, Abhi, and Bobby, Sergey Levitskiy has been providing great help testing VMware.
I apologize for my sporadic status updates. We have made significant progress in getting smoke tests to pass on KVM, XenServer, and VMware. Currently, we have the following number of failures and errors: * KVM: 0 * VMware: 4 * XenServer: 8 The outstanding failures and errors seem to be the caused by the following issues: 1. On VMware and XenServer, guest VMs in VPCs start but don’t acquire IP addresses causing tests relying on SSH connectivity tests to fail. The issue occurs does not occur on KVM, intermittently on VMware, and consistently on XenServer. This issue affects the test_vpc_redundant, test_privategw_acl, and test_vpc_vpn test suites. We believe that this issue may be caused by either the guest VMs startup/DHCP wait period winning the race with the VPC VR configuration or there is a problem on the VPC VR assigning IP addresses. We are currently investigating and expect to identify the root cause shortly. 2. SSVM downloads str being restarted due to ping timeouts on XenServer and VMware. We are seeing the following messages such as the following in the Management Server logs: com.cloud.utils.exception.CloudRuntimeException: Failed to send command, due to Agent:5,com.cloud.exception.OperationTimedoutException: Commands 9042102151853113352 to Host 5 timed out after 2400 Our initial investigation discovered different timezones being used by the system VM templates and Management Server. This discrepancy We have modified Trillian to ensure consistent configuration of time zones across a cluster, and are preparing another run for XenServer and VMware. KVM is not affected by this time zone issue because KVM hosts use the same CentOS template as CentOS based Management Servers -- creating time zone consistency by side effect. Reports of each test run are available on PR #1692 [1]. We have kicked a new round of tests on KVM, VMware, and XenServer with the time zone fix and additional instrumentation to run down the VPC VR race condition. Instead of directly forward merging these changes, we plan to open a PR for each forward merge. Since we are very close to having 4.8 resolved, Rohit has open PR 1703 [2] for the 4.9 forward merge and kicked off a test run. While we cannot close this PR until 1692 is complete, we are hoping to get a head start on any issues in the 4.9 branch. Thank you again for your patience, -John [1]: https://github.com/apache/cloudstack/pull/1692 [2]: https://github.com/apache/cloudstack/pull/1703 > On Oct 5, 2016, at 4:32 AM, Haijiao <18602198...@163.com> wrote: > > Though I am one of the silent majority, I would thank John the dev team for > your continuous effort, you keep ACS alive and better ! > > > Just heard one of biggest finance company in China running 10,000+ VMs on ACS > 4.4 for production/dev/QAS, you guys should be proud of that. > Salute to you! > > > > > > > > 在2016年10月05 03时09分, "ilya"<ilya.mailing.li...@gmail.com>写道: > > John and Team > > Thanks for amazing work and contributing back. > > Regards, > ilya > > On 10/3/16 9:48 PM, John Burwell wrote: >> All, >> >> A quick update on our progress to pass all smoke tests aka super green. We >> have reduced the failures and errors for XenServer from 93 to 9 and for >> VMware from 51 to 14. A CentOS 6/CentOS 6 KVM run is currently executing. >> Based on manual tests/fixes, we are expecting to be the first super green >> configuration. We have also found the following additional defects: >> >> * CLOUDSTACK-9528 [2]: SSVM Downloads (built-in) Template Multiple Times >> * CLOUDSTACK-9529 [3]: Marvin Tests Do Not Clean Up Properly >> >> 9528 is causing XenServer environments to fail to install and startup >> cleanly. A lack of cleanup described in 9529 is causing XenServer to >> exhaust available resources before a test run completes. We believe that >> resolution of these issues will address most, if not all, of the XenServer >> issues. >> >> Thanks, >> -John >> >> [1]: >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65873020 >> [2]: https://issues.apache.org/jira/browse/CLOUDSTACK-9528 >> [3]: https://issues.apache.org/jira/browse/CLOUDSTACK-9529 >> >>> >> john.burw...@shapeblue.com >> www.shapeblue.com >> 53 Chandos Place, Covent Garden, London VA WC2N 4HSUK >> @shapeblue >> >> >> >> On Sep 30, 2016, at 2:40 AM, John Burwell <john.burw...@shapeblue.com> wrote: >>> >>> All, >>> >>> Using blueorganutan, Rohit, Murali, Boris, Paul, Abhi, and I are executing >>> the smoke tests for the 4.8, 4.9, and master branches against the following >>> environments: >>> >>> * CentOS 7.2 Management Server + VMware 5.5u3 + NFS Primary/Secondary >>> Storage >>> * CentOS 7.2 Management Server + XenServer 6.5SP1 + NFS >>> Primary/Secondary Storage >>> * CentOS 7.2 Management Server + CentOS 7.2 KVM + NFS Primary/Secondary >>> Storage >>> >>> Thus far, we have found seven (7) test case and/or CloudStack defects in >>> the VMware run for the 4.8 branch [1]. We are currently triaging fifty-one >>> (51) new issues from the XenServer run to determine which issues were >>> environmental and defects. This triage work should be completed today (30 >>> Sept 2016). Finally, we are awaiting the results of the KVM run. >>> >>> We are using PR #1692 [2] as the master tracking PR to fix all defects in >>> the 4.8 branch. Our goal is to get all non-skip tests to pass and then >>> merge this PR to the 4.8, 4.9, and master. For each bug, we are creating a >>> JIRA ticket and adding a commit to the PR. Currently, the branch for this >>> PR is in the shapeblue repo (the branch started with a much smaller fix >>> from Paul and we just kept using it). However, if others are interested in >>> picking up defects, we will move it to ASF repo. Once the 4.8 branch is >>> stabilized, we plan to re-execute these tests on the 4.9 and master >>> branches as we expect that the 4.9 and master branches will have additional >>> issues. >>> >>> Since we are in a test freeze, I propose that no further PRs are merged to >>> the 4.8, 4.9, and master branches until they are stabilized. The following >>> PRs will be re-based, re-tested, and merged to 4.8, 4.9.1.0, and/or >>> 4.10.0.0 post-stabilization: >>> >>> * 1696 >>> * 1694 >>> * 1684 >>> * 1681 >>> * 1680 >>> * 1678 >>> * 1677 >>> * 1676 >>> * 1674 >>> * 1673 >>> * 1642 >>> * 1624 >>> * 1615 >>> * 1600 >>> * 1545 >>> * 1542 >>> >>> I recognize that this a large backlog of contributions ready to merge, and >>> apologize for asking folks to wait. However, given current state of the >>> release branches, merging them before we complete fixing the smoke tests >>> would create a moving target that further delay stabilization. >>> >>> Obviously, it is unlikely we will make the 10 October 2016 release date for >>> the 4.8.2.0, 4.9.1.0, and 4.10.0.0 releases. At this point, it is >>> difficult to estimate the size of the schedule slip because we still have >>> issues to triage and test runs to complete. I have created a wiki page [2] >>> to track progress on this effort. >>> >>> Does this approach sound reasonable? Any suggestions to speed up this >>> process will be greatly appreciated as stabilizing and re-opening these >>> branches stable ASAP is critical for the community. >>> >>> Thanks, >>> -John >>> >>> [1]: >>> https://issues.apache.org/jira/browse/CLOUDSTACK-9518?jql=project%20%3D%20CLOUDSTACK%20AND%20fixVersion%20in%20(4.8.2.0)%20AND%20labels%20in%20(4.8.2.0-smoke-test-failure) >>> [2]: >>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65873020 >>> >>>> On Sep 26, 2016, at 8:38 AM, Will Stevens <wstev...@cloudops.com> wrote: >>>> >>>> Yes, I think it is important that you or Rajani sign off on anything that >>>> gets in while branches are frozen so you guys can stay on top of what goes >>>> in. >>>> >>>> Thanks for all the hard work team. :) >>>> >>>> *Will STEVENS* >>>> Lead Developer >>>> >>>> *CloudOps* *| *Cloud Solutions Experts >>>> 420 rue Guy *|* Montreal *|* Quebec *|* H3J 1S6 >>>> w cloudops.com *|* tw @CloudOps_ >>>> >>>> On Mon, Sep 26, 2016 at 2:10 AM, John Burwell <john.burw...@shapeblue.com> >>>> wrote: >>>> >>>>> All, >>>>> >>>>> Per our release schedule [1], the 4.8, 4.9, and master branches are frozen >>>>> for testing. There are some straggling PRs that Rajani and I are working >>>>> to merge. Is it acceptable to everyone that for the next two (2) weeks, >>>>> all PRs require not only 2 LGTMs, but approval by Rajani or I to be merged >>>>> to these branches? To be clear, we don’t have to perform the merges, >>>>> simply give a thumbs up. >>>>> >>>>> Thanks, >>>>> -John >>>>> john.burw...@shapeblue.com >>>>> www.shapeblue.com >>>>> 53 Chandos Place, Covent Garden, London VA WC2N 4HSUK >>>>> @shapeblue >>>>> >>>>> >>>>> >>>>> >>> >>> >>> john.burw...@shapeblue.com >>> www.shapeblue.com >>> 53 Chandos Place, Covent Garden, London VA WC2N 4HSUK >>> @shapeblue >>> >>> >>> >> john.burw...@shapeblue.com www.shapeblue.com 53 Chandos Place, Covent Garden, London VA WC2N 4HSUK @shapeblue