Great work everyone.  Don't worry about the sporadic updates, that is just
the nature of the beast when working through stuff like this.  Well done so
far...

*Will STEVENS*
Lead Developer

*CloudOps* *| *Cloud Solutions Experts
420 rue Guy *|* Montreal *|* Quebec *|* H3J 1S6
w cloudops.com *|* tw @CloudOps_

On Fri, Oct 7, 2016 at 9:53 AM, John Burwell <john.burw...@shapeblue.com>
wrote:

> All,
>
> Thank you Ilya and Haijao for your words of encouragement.  In addition to
> the efforts of Paul, Rohit, Murali, Abhi, and Bobby, Sergey Levitskiy has
> been providing great help testing VMware.
>
> I apologize for my sporadic status updates.  We have made significant
> progress in getting smoke tests to pass on KVM, XenServer, and VMware.
> Currently, we have the following number of failures and errors:
>
>         * KVM: 0
>         * VMware: 4
>         * XenServer: 8
>
> The outstanding failures and errors seem to be the caused by the following
> issues:
>
>         1. On VMware and XenServer, guest VMs in VPCs start but don’t
> acquire IP addresses causing tests relying on SSH connectivity tests to
> fail.  The issue occurs does not occur on KVM, intermittently on VMware,
> and consistently on XenServer.  This issue affects the test_vpc_redundant,
> test_privategw_acl, and test_vpc_vpn test suites.   We believe that this
> issue may be caused by either the guest VMs startup/DHCP wait period
> winning the race with the VPC VR configuration or there is a problem on the
> VPC VR assigning IP addresses.  We are currently investigating and expect
> to identify the root cause shortly.
>         2. SSVM downloads str being restarted due to ping timeouts on
> XenServer and VMware.  We are seeing the following messages such as the
> following in the Management Server logs:
>
>                 com.cloud.utils.exception.CloudRuntimeException: Failed
> to send command, due to 
> Agent:5,com.cloud.exception.OperationTimedoutException:
> Commands
>                 9042102151853113352 to Host 5 timed out after 2400
>
>           Our initial investigation discovered different timezones being
> used by the system VM templates and Management Server.  This discrepancy We
> have modified Trillian to ensure consistent configuration of time zones
> across a cluster, and are preparing another run for XenServer and VMware.
> KVM is not affected by this time zone issue because KVM hosts use the same
> CentOS template as CentOS based Management Servers -- creating time zone
> consistency by side effect.
>
> Reports of each test run are available on PR #1692 [1].  We have kicked a
> new round of tests on KVM, VMware, and XenServer with the time zone fix and
> additional instrumentation to run down the VPC VR race condition.
>
> Instead of directly forward merging these changes, we plan to open a PR
> for each forward merge.  Since we are very close to having 4.8 resolved,
> Rohit has open PR 1703 [2] for the 4.9 forward merge and kicked off a test
> run.  While we cannot close this PR until 1692 is complete, we are hoping
> to get a head start on any issues in the 4.9 branch.
>
> Thank you again for your patience,
> -John
>
> [1]: https://github.com/apache/cloudstack/pull/1692
> [2]: https://github.com/apache/cloudstack/pull/1703
>
> > On Oct 5, 2016, at 4:32 AM, Haijiao <18602198...@163.com> wrote:
> >
> > Though I am one of the silent majority, I would thank John the dev team
> for your continuous effort, you keep ACS alive and better !
> >
> >
> > Just heard one of biggest finance company in China running 10,000+ VMs
> on ACS 4.4 for production/dev/QAS,  you guys should be proud of that.
> > Salute to you!
> >
> >
> >
> >
> >
> >
> >
> > 在2016年10月05 03时09分, "ilya"<ilya.mailing.li...@gmail.com>写道:
> >
> > John and Team
> >
> > Thanks for amazing work and contributing back.
> >
> > Regards,
> > ilya
> >
> > On 10/3/16 9:48 PM, John Burwell wrote:
> >> All,
> >>
> >> A quick update on our progress to pass all smoke tests aka super
> green.  We have reduced the failures and errors for XenServer from 93 to 9
> and for VMware from 51 to 14.  A CentOS 6/CentOS 6 KVM run is currently
> executing.  Based on manual tests/fixes, we are expecting to be the first
> super green configuration.  We have also found the following additional
> defects:
> >>
> >>  * CLOUDSTACK-9528 [2]: SSVM Downloads (built-in) Template Multiple
> Times
> >>  * CLOUDSTACK-9529 [3]: Marvin Tests Do Not Clean Up Properly
> >>
> >> 9528 is causing XenServer environments to fail to install and startup
> cleanly.  A lack of cleanup described in 9529 is causing XenServer to
> exhaust available resources before a test run completes.  We believe that
> resolution of these issues will address most, if not all, of the XenServer
> issues.
> >>
> >> Thanks,
> >> -John
> >>
> >> [1]: https://cwiki.apache.org/confluence/pages/viewpage.
> action?pageId=65873020
> >> [2]: https://issues.apache.org/jira/browse/CLOUDSTACK-9528
> >> [3]: https://issues.apache.org/jira/browse/CLOUDSTACK-9529
> >>
> >>>
> >> john.burw...@shapeblue.com
> >> www.shapeblue.com
> >> 53 Chandos Place, Covent Garden, London VA WC2N 4HSUK
> >> @shapeblue
> >>
> >>
> >>
> >> On Sep 30, 2016, at 2:40 AM, John Burwell <john.burw...@shapeblue.com>
> wrote:
> >>>
> >>> All,
> >>>
> >>> Using blueorganutan, Rohit, Murali, Boris, Paul, Abhi, and I are
> executing the smoke tests for the 4.8, 4.9, and master branches against the
> following environments:
> >>>
> >>>    * CentOS 7.2 Management Server + VMware 5.5u3 + NFS
> Primary/Secondary Storage
> >>>    * CentOS 7.2 Management Server + XenServer 6.5SP1 + NFS
> Primary/Secondary Storage
> >>>    * CentOS 7.2 Management Server + CentOS 7.2 KVM + NFS
> Primary/Secondary Storage
> >>>
> >>> Thus far, we have found seven (7) test case and/or CloudStack defects
> in the VMware run for the 4.8 branch [1].  We are currently triaging
> fifty-one (51) new issues from the XenServer run to determine which issues
> were environmental and defects.  This triage work should be completed today
> (30 Sept 2016).  Finally, we are awaiting the results of the KVM run.
> >>>
> >>> We are using PR #1692 [2] as the master tracking PR to fix all defects
> in the 4.8 branch.  Our goal is to get all non-skip tests to pass and then
> merge this PR to the 4.8, 4.9, and master.  For each bug, we are creating a
> JIRA ticket and adding a commit to the PR.  Currently, the branch for this
> PR is in the shapeblue repo (the branch started with a much smaller fix
> from Paul and we just kept using it).  However, if others are interested in
> picking up defects, we will move it to ASF repo.  Once the 4.8 branch is
> stabilized, we plan to re-execute these tests on the 4.9 and master
> branches as we expect that the 4.9 and master branches will have additional
> issues.
> >>>
> >>> Since we are in a test freeze, I propose that no further PRs are
> merged to the 4.8, 4.9, and master branches until they are stabilized.  The
> following PRs will be re-based, re-tested, and merged to 4.8, 4.9.1.0,
> and/or 4.10.0.0 post-stabilization:
> >>>
> >>>    * 1696
> >>>    * 1694
> >>>    * 1684
> >>>     * 1681
> >>>    * 1680
> >>>    * 1678
> >>>    * 1677
> >>>    * 1676
> >>>    * 1674
> >>>    * 1673
> >>>    * 1642
> >>>    * 1624
> >>>    * 1615
> >>>    * 1600
> >>>    * 1545
> >>>    * 1542
> >>>
> >>> I recognize that this a large backlog of contributions ready to merge,
> and apologize for asking folks to wait.  However, given current state of
> the release branches, merging them before we complete fixing the smoke
> tests would create a moving target that further delay stabilization.
> >>>
> >>> Obviously, it is unlikely we will make the 10 October 2016 release
> date for the 4.8.2.0, 4.9.1.0, and 4.10.0.0 releases.  At this point, it is
> difficult to estimate the size of the schedule slip because we still have
> issues to triage and test runs to complete.  I have created a wiki page [2]
> to track progress on this effort.
> >>>
> >>> Does this approach sound reasonable?  Any suggestions to speed up this
> process will be greatly appreciated as stabilizing and re-opening these
> branches stable ASAP is critical for the community.
> >>>
> >>> Thanks,
> >>> -John
> >>>
> >>> [1]: https://issues.apache.org/jira/browse/CLOUDSTACK-9518?
> jql=project%20%3D%20CLOUDSTACK%20AND%20fixVersion%20in%20(4.8.2.0)%
> 20AND%20labels%20in%20(4.8.2.0-smoke-test-failure)
> >>> [2]: https://cwiki.apache.org/confluence/pages/viewpage.
> action?pageId=65873020
> >>>
> >>>> On Sep 26, 2016, at 8:38 AM, Will Stevens <wstev...@cloudops.com>
> wrote:
> >>>>
> >>>> Yes, I think it is important that you or Rajani sign off on anything
> that
> >>>> gets in while branches are frozen so you guys can stay on top of what
> goes
> >>>> in.
> >>>>
> >>>> Thanks for all the hard work team.  :)
> >>>>
> >>>> *Will STEVENS*
> >>>> Lead Developer
> >>>>
> >>>> *CloudOps* *| *Cloud Solutions Experts
> >>>> 420 rue Guy *|* Montreal *|* Quebec *|* H3J 1S6
> >>>> w cloudops.com *|* tw @CloudOps_
> >>>>
> >>>> On Mon, Sep 26, 2016 at 2:10 AM, John Burwell <
> john.burw...@shapeblue.com>
> >>>> wrote:
> >>>>
> >>>>> All,
> >>>>>
> >>>>> Per our release schedule [1], the 4.8, 4.9, and master branches are
> frozen
> >>>>> for testing.  There are some straggling PRs that Rajani and I are
> working
> >>>>> to merge.  Is it acceptable to everyone that for the next two (2)
> weeks,
> >>>>> all PRs require not only 2 LGTMs, but approval by Rajani or I to be
> merged
> >>>>> to these branches?  To be clear, we don’t have to perform the merges,
> >>>>> simply give a thumbs up.
> >>>>>
> >>>>> Thanks,
> >>>>> -John
> >>>>> john.burw...@shapeblue.com
> >>>>> www.shapeblue.com
> >>>>> 53 Chandos Place, Covent Garden, London VA WC2N 4HSUK
> >>>>> @shapeblue
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>
> >>>
> >>> john.burw...@shapeblue.com
> >>> www.shapeblue.com
> >>> 53 Chandos Place, Covent Garden, London VA WC2N 4HSUK
> >>> @shapeblue
> >>>
> >>>
> >>>
> >>
>
>
> john.burw...@shapeblue.com
> www.shapeblue.com
> 53 Chandos Place, Covent Garden, London VA WC2N 4HSUK
> @shapeblue
>
>
>
>

Reply via email to