All,

We have made great strides stabilizing the 4.8 [1] and 4.9 [2] smoke tests.  
While we are not super green, the following remaining failures/issues are 
isolated to the VPC VR and secondary storage.  

        * CLOUDSTACK-9541: redundant VPC VR: issues when master and backup 
switch happens on failover [3]
        * CLOUDSTACK-9540: createPrivateGateway create private network does not 
create proper VLAN network on XenServer
        * CLOUDSTACK-9528: SSVM Downloads (built-in) template multiple times

Therefore, I would like to merge these two PRs so that we can begin the process 
of rebasing and retesting the PRs slotted for 4.8 and 4.9 that are not affected 
by these issues (i.e. PRs unrelated to secondary storage or the VR).  Our hope 
is that we can correct these issues quickly, and by the time we have worked 
through the backlog of pending PRs, these issues will be addressed and we can 
move those impacted forward.

Unfortunately, the master PR [5] has 6 failures and 4 errors on XenServer [6] 
that we are currently analyzing.  We hope to have these resolved shortly in 
order to begin progressing PRs targeting master.

I would like to get 1692 [1] and 1703 [2] merged in the next 24 hours.  We need 
to complete the following actions in order to accomplish this goal:

        * Obtain at least one code review LGTM on PR #1692 [1]
        * Obtain at least one code review LGTM on PR #1703 [2]
        * Obtain at least one test review LGTM on PR #1703 [2]

Once these PRs, I will be updating PRs slotted for 4.8 and 4.9 to ping authors 
for a rebase.  Following each rebase, we will trigger blueorangutan to retest 
each one.

Thank again for your patience and assistance,
-John

[1]: https://github.com/apache/cloudstack/pull/1692
[2]: https://github.com/apache/cloudstack/pull/1703
[3]: https://issues.apache.org/jira/browse/CLOUDSTACK-9541
[4]: https://issues.apache.org/jira/browse/CLOUDSTACK-9540
[5]: https://github.com/apache/cloudstack/pull/1708
[6]: https://github.com/apache/cloudstack/pull/1708#issuecomment-253698099

> On Oct 7, 2016, at 10:12 AM, Will Stevens <wstev...@cloudops.com> wrote:
> 
> Great work everyone.  Don't worry about the sporadic updates, that is just
> the nature of the beast when working through stuff like this.  Well done so
> far...
> 
> *Will STEVENS*
> Lead Developer
> 
> *CloudOps* *| *Cloud Solutions Experts
> 420 rue Guy *|* Montreal *|* Quebec *|* H3J 1S6
> w cloudops.com *|* tw @CloudOps_
> 
> On Fri, Oct 7, 2016 at 9:53 AM, John Burwell <john.burw...@shapeblue.com>
> wrote:
> 
>> All,
>> 
>> Thank you Ilya and Haijao for your words of encouragement.  In addition to
>> the efforts of Paul, Rohit, Murali, Abhi, and Bobby, Sergey Levitskiy has
>> been providing great help testing VMware.
>> 
>> I apologize for my sporadic status updates.  We have made significant
>> progress in getting smoke tests to pass on KVM, XenServer, and VMware.
>> Currently, we have the following number of failures and errors:
>> 
>>        * KVM: 0
>>        * VMware: 4
>>        * XenServer: 8
>> 
>> The outstanding failures and errors seem to be the caused by the following
>> issues:
>> 
>>        1. On VMware and XenServer, guest VMs in VPCs start but don’t
>> acquire IP addresses causing tests relying on SSH connectivity tests to
>> fail.  The issue occurs does not occur on KVM, intermittently on VMware,
>> and consistently on XenServer.  This issue affects the test_vpc_redundant,
>> test_privategw_acl, and test_vpc_vpn test suites.   We believe that this
>> issue may be caused by either the guest VMs startup/DHCP wait period
>> winning the race with the VPC VR configuration or there is a problem on the
>> VPC VR assigning IP addresses.  We are currently investigating and expect
>> to identify the root cause shortly.
>>        2. SSVM downloads str being restarted due to ping timeouts on
>> XenServer and VMware.  We are seeing the following messages such as the
>> following in the Management Server logs:
>> 
>>                com.cloud.utils.exception.CloudRuntimeException: Failed
>> to send command, due to 
>> Agent:5,com.cloud.exception.OperationTimedoutException:
>> Commands
>>                9042102151853113352 to Host 5 timed out after 2400
>> 
>>          Our initial investigation discovered different timezones being
>> used by the system VM templates and Management Server.  This discrepancy We
>> have modified Trillian to ensure consistent configuration of time zones
>> across a cluster, and are preparing another run for XenServer and VMware.
>> KVM is not affected by this time zone issue because KVM hosts use the same
>> CentOS template as CentOS based Management Servers -- creating time zone
>> consistency by side effect.
>> 
>> Reports of each test run are available on PR #1692 [1].  We have kicked a
>> new round of tests on KVM, VMware, and XenServer with the time zone fix and
>> additional instrumentation to run down the VPC VR race condition.
>> 
>> Instead of directly forward merging these changes, we plan to open a PR
>> for each forward merge.  Since we are very close to having 4.8 resolved,
>> Rohit has open PR 1703 [2] for the 4.9 forward merge and kicked off a test
>> run.  While we cannot close this PR until 1692 is complete, we are hoping
>> to get a head start on any issues in the 4.9 branch.
>> 
>> Thank you again for your patience,
>> -John
>> 
>> [1]: https://github.com/apache/cloudstack/pull/1692
>> [2]: https://github.com/apache/cloudstack/pull/1703
>> 
>>> On Oct 5, 2016, at 4:32 AM, Haijiao <18602198...@163.com> wrote:
>>> 
>>> Though I am one of the silent majority, I would thank John the dev team
>> for your continuous effort, you keep ACS alive and better !
>>> 
>>> 
>>> Just heard one of biggest finance company in China running 10,000+ VMs
>> on ACS 4.4 for production/dev/QAS,  you guys should be proud of that.
>>> Salute to you!
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 在2016年10月05 03时09分, "ilya"<ilya.mailing.li...@gmail.com>写道:
>>> 
>>> John and Team
>>> 
>>> Thanks for amazing work and contributing back.
>>> 
>>> Regards,
>>> ilya
>>> 
>>> On 10/3/16 9:48 PM, John Burwell wrote:
>>>> All,
>>>> 
>>>> A quick update on our progress to pass all smoke tests aka super
>> green.  We have reduced the failures and errors for XenServer from 93 to 9
>> and for VMware from 51 to 14.  A CentOS 6/CentOS 6 KVM run is currently
>> executing.  Based on manual tests/fixes, we are expecting to be the first
>> super green configuration.  We have also found the following additional
>> defects:
>>>> 
>>>> * CLOUDSTACK-9528 [2]: SSVM Downloads (built-in) Template Multiple
>> Times
>>>> * CLOUDSTACK-9529 [3]: Marvin Tests Do Not Clean Up Properly
>>>> 
>>>> 9528 is causing XenServer environments to fail to install and startup
>> cleanly.  A lack of cleanup described in 9529 is causing XenServer to
>> exhaust available resources before a test run completes.  We believe that
>> resolution of these issues will address most, if not all, of the XenServer
>> issues.
>>>> 
>>>> Thanks,
>>>> -John
>>>> 
>>>> [1]: https://cwiki.apache.org/confluence/pages/viewpage.
>> action?pageId=65873020
>>>> [2]: https://issues.apache.org/jira/browse/CLOUDSTACK-9528
>>>> [3]: https://issues.apache.org/jira/browse/CLOUDSTACK-9529
>>>> 
>>>>> 
>>>> john.burw...@shapeblue.com
>>>> www.shapeblue.com
>>>> 53 Chandos Place, Covent Garden, London VA WC2N 4HSUK
>>>> @shapeblue
>>>> 
>>>> 
>>>> 
>>>> On Sep 30, 2016, at 2:40 AM, John Burwell <john.burw...@shapeblue.com>
>> wrote:
>>>>> 
>>>>> All,
>>>>> 
>>>>> Using blueorganutan, Rohit, Murali, Boris, Paul, Abhi, and I are
>> executing the smoke tests for the 4.8, 4.9, and master branches against the
>> following environments:
>>>>> 
>>>>>   * CentOS 7.2 Management Server + VMware 5.5u3 + NFS
>> Primary/Secondary Storage
>>>>>   * CentOS 7.2 Management Server + XenServer 6.5SP1 + NFS
>> Primary/Secondary Storage
>>>>>   * CentOS 7.2 Management Server + CentOS 7.2 KVM + NFS
>> Primary/Secondary Storage
>>>>> 
>>>>> Thus far, we have found seven (7) test case and/or CloudStack defects
>> in the VMware run for the 4.8 branch [1].  We are currently triaging
>> fifty-one (51) new issues from the XenServer run to determine which issues
>> were environmental and defects.  This triage work should be completed today
>> (30 Sept 2016).  Finally, we are awaiting the results of the KVM run.
>>>>> 
>>>>> We are using PR #1692 [2] as the master tracking PR to fix all defects
>> in the 4.8 branch.  Our goal is to get all non-skip tests to pass and then
>> merge this PR to the 4.8, 4.9, and master.  For each bug, we are creating a
>> JIRA ticket and adding a commit to the PR.  Currently, the branch for this
>> PR is in the shapeblue repo (the branch started with a much smaller fix
>> from Paul and we just kept using it).  However, if others are interested in
>> picking up defects, we will move it to ASF repo.  Once the 4.8 branch is
>> stabilized, we plan to re-execute these tests on the 4.9 and master
>> branches as we expect that the 4.9 and master branches will have additional
>> issues.
>>>>> 
>>>>> Since we are in a test freeze, I propose that no further PRs are
>> merged to the 4.8, 4.9, and master branches until they are stabilized.  The
>> following PRs will be re-based, re-tested, and merged to 4.8, 4.9.1.0,
>> and/or 4.10.0.0 post-stabilization:
>>>>> 
>>>>>   * 1696
>>>>>   * 1694
>>>>>   * 1684
>>>>>    * 1681
>>>>>   * 1680
>>>>>   * 1678
>>>>>   * 1677
>>>>>   * 1676
>>>>>   * 1674
>>>>>   * 1673
>>>>>   * 1642
>>>>>   * 1624
>>>>>   * 1615
>>>>>   * 1600
>>>>>   * 1545
>>>>>   * 1542
>>>>> 
>>>>> I recognize that this a large backlog of contributions ready to merge,
>> and apologize for asking folks to wait.  However, given current state of
>> the release branches, merging them before we complete fixing the smoke
>> tests would create a moving target that further delay stabilization.
>>>>> 
>>>>> Obviously, it is unlikely we will make the 10 October 2016 release
>> date for the 4.8.2.0, 4.9.1.0, and 4.10.0.0 releases.  At this point, it is
>> difficult to estimate the size of the schedule slip because we still have
>> issues to triage and test runs to complete.  I have created a wiki page [2]
>> to track progress on this effort.
>>>>> 
>>>>> Does this approach sound reasonable?  Any suggestions to speed up this
>> process will be greatly appreciated as stabilizing and re-opening these
>> branches stable ASAP is critical for the community.
>>>>> 
>>>>> Thanks,
>>>>> -John
>>>>> 
>>>>> [1]: https://issues.apache.org/jira/browse/CLOUDSTACK-9518?
>> jql=project%20%3D%20CLOUDSTACK%20AND%20fixVersion%20in%20(4.8.2.0)%
>> 20AND%20labels%20in%20(4.8.2.0-smoke-test-failure)
>>>>> [2]: https://cwiki.apache.org/confluence/pages/viewpage.
>> action?pageId=65873020
>>>>> 
>>>>>> On Sep 26, 2016, at 8:38 AM, Will Stevens <wstev...@cloudops.com>
>> wrote:
>>>>>> 
>>>>>> Yes, I think it is important that you or Rajani sign off on anything
>> that
>>>>>> gets in while branches are frozen so you guys can stay on top of what
>> goes
>>>>>> in.
>>>>>> 
>>>>>> Thanks for all the hard work team.  :)
>>>>>> 
>>>>>> *Will STEVENS*
>>>>>> Lead Developer
>>>>>> 
>>>>>> *CloudOps* *| *Cloud Solutions Experts
>>>>>> 420 rue Guy *|* Montreal *|* Quebec *|* H3J 1S6
>>>>>> w cloudops.com *|* tw @CloudOps_
>>>>>> 
>>>>>> On Mon, Sep 26, 2016 at 2:10 AM, John Burwell <
>> john.burw...@shapeblue.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> All,
>>>>>>> 
>>>>>>> Per our release schedule [1], the 4.8, 4.9, and master branches are
>> frozen
>>>>>>> for testing.  There are some straggling PRs that Rajani and I are
>> working
>>>>>>> to merge.  Is it acceptable to everyone that for the next two (2)
>> weeks,
>>>>>>> all PRs require not only 2 LGTMs, but approval by Rajani or I to be
>> merged
>>>>>>> to these branches?  To be clear, we don’t have to perform the merges,
>>>>>>> simply give a thumbs up.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> -John
>>>>>>> john.burw...@shapeblue.com
>>>>>>> www.shapeblue.com
>>>>>>> 53 Chandos Place, Covent Garden, London VA WC2N 4HSUK
>>>>>>> @shapeblue
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> 
>>>>> john.burw...@shapeblue.com
>>>>> www.shapeblue.com
>>>>> 53 Chandos Place, Covent Garden, London VA WC2N 4HSUK
>>>>> @shapeblue
>>>>> 
>>>>> 
>>>>> 
>>>> 
>> 
>> 
>> john.burw...@shapeblue.com
>> www.shapeblue.com
>> 53 Chandos Place, Covent Garden, London VA WC2N 4HSUK
>> @shapeblue
>> 
>> 
>> 
>> 


john.burw...@shapeblue.com 
www.shapeblue.com
53 Chandos Place, Covent Garden, London VA WC2N 4HSUK
@shapeblue
  
 

Reply via email to