Hi Folks,

Just want to say that we think we have a fix for this and are currently
testing it. We'll deploy it asap once we run a few tests to make sure it
works.

We think the issue is that we're spinning up VMs with less disk space than
we did on the public cloud because jobs don't need 100GB of disk space. We
think this issue appears when a VM is reused a few times and since jobs
don't cleanup the workspace at the end of the build when a new job takes
over it has less available space.

Our working patch will deploy the Workspace cleanup plugin and force the
workspace to be cleared up at the end of each build. Will update again once
we have this deployed.

Regards,
Thanh

On 27 June 2016 at 06:46, Andrej Leitner -X (anleitne - PANTHEON
TECHNOLOGIES at Cisco) <[email protected]> wrote:

> same here
>
>
> https://jenkins.opendaylight.org/releng/job/openflowplugin-distribution-check-boron/1021/console
>
> Waiting for Jenkins to finish collecting data[ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-install-plugin:2.5.2:install (default-install) 
> on project distribution-karaf:
> Failed to install artifact 
> org.opendaylight.integration:distribution-karaf:tar.gz:0.5.0-SNAPSHOT: No 
> space left on device -> [Help 1]
>
>
> ------------------------------
> *From:* [email protected] <
> [email protected]> on behalf of Anil Vishnoi <
> [email protected]>
> *Sent:* Monday, June 27, 2016 1:42 AM
> *To:* Andrew Grimberg
> *Cc:* OpenDaylight Dev; OpenDaylight Discuss; IT Infrastructure Alerts;
> [email protected]; OpenDaylight Infrastructure
> *Subject:* Re: [release] OpenDaylight Jenkins releng (production)
> maintenance 2016-06-25 @08:00 - 17:00 PDT
>
> even the verify job is failing for the same reason
>
>
> https://jenkins.opendaylight.org/releng/job/neutron-verify-boron/310/jdk=openjdk8,nodes=dynamic_verify/console
>
> Caused by: hudson.plugins.git.GitException: Command "git init 
> /w/workspace/neutron-verify-boron/jdk/openjdk8/nodes/dynamic_verify" returned 
> status code 128:
> stdout:
> stderr: fatal: cannot copy 
> '/usr/share/git-core/templates/hooks/pre-applypatch.sample' to 
> '/w/workspace/neutron-verify-boron/jdk/openjdk8/nodes/dynamic_verify/.git/hooks/pre-applypatch.sample':
>  No space left on device
>
>
> On Sun, Jun 26, 2016 at 4:39 PM, Anil Vishnoi <[email protected]>
> wrote:
>
>> Hi Andy/Thanh,
>>
>> For one of my patch distribution job is failing because of disk space
>> issue.
>>
>>
>> https://jenkins.opendaylight.org/releng/job/neutron-distribution-check-boron/248/console
>>
>> [ERROR] Failed to execute goal 
>> org.apache.maven.plugins:maven-install-plugin:2.5.2:install 
>> (default-install) on project distribution-karaf: Failed to install artifact 
>> org.opendaylight.integration:distribution-karaf:zip:0.5.0-SNAPSHOT: No space 
>> left on device -> [Help 1]
>>
>>
>> On Sun, Jun 26, 2016 at 12:02 PM, Andrew Grimberg <
>> [email protected]> wrote:
>>
>>> Greetings folks,
>>>
>>> Just shy of 28 hours after we started the maintenance, and missing our
>>> original window end only by 19 hours... we're now declaring the Jenkins
>>> migration completed.
>>>
>>> I apologize for the really, really bad window estimate there.
>>>
>>> For those interested here's all the changes that happened during this
>>> maintenance:
>>>
>>> * Jenkins migrated from Rackspace public cloud to private cloud. This
>>> was the bulk of the work. We had between 2.4 and 2.5TB of data that was
>>> syncronized between the old and new systems. This sync was running all
>>> week and while it had finished on Thursday night, our finalized sync
>>> with Jenkins shutdown on both ends is what took the bulk of our time.
>>>
>>> - Jenkins was updated to the latest LTS version (1.651.3) We had been
>>> running on a very old LTS version as updating past it while possible on
>>> the EL6 system it was on, was not easy with several of the plugins we
>>> use needing newer system level services. The new Jenkins system is EL7
>>>
>>> - We transitioned off of the JClouds provider plugin to the OpenStack
>>> provider plugin for all of our current instance management
>>>
>>> - We reconfigured the Jenkins hosted maven settings files to a better
>>> naming configuration so we could do away with our mapping macros in JJB
>>>
>>> * Nexus was updated to the latest version (2.12.0 -> 2.13.0)
>>>
>>> * All systems in the CI environment received the latest system updates
>>> which we try to do on a monthly basis anyway but it was perfect timing
>>>
>>> * CLM was updated to the latest version (1.19 -> 1.21)
>>>
>>> Additional changes folks may notice:
>>>
>>> * Jenkins build instances will only get reused if there is sufficient
>>> queue to support a verify or merge job hitting one as soon as a previous
>>> job finishes as they will only idle for 1 minute now instead of 15 - 30
>>> minutes as we did in the public cloud
>>>
>>> * Instances will, generally, start much faster as only our images are
>>> cached on the compute nodes
>>>
>>> * For those that have looked at the vagrant definitions we use for
>>> managing the instance snapshots, you may notice that they're a bit
>>> simpler. While we haven't put in the extra work to make them operational
>>> with the standard upstream vagrant boxes available, our base images +
>>> vagrant definitions are all completely in the open now instead of having
>>> to base the systems on something from Rackspace that we couldn't hand to
>>> the community.
>>>
>>> Finally, I want to thank Thanh for sticking with me through this
>>> migration. It was definitely a lot longer than we had originally planned
>>> but with all the work that he put in, along with the rest of the folks
>>> in integration, we seem to have ironed out most of the issues before
>>> they even showed up.
>>>
>>> At this point, the issues that I truly expect us to see are going to be
>>> capacity related, so if the queues get extra long for a bit, we're
>>> sorry. We're aware that it's a possibility with a change of this
>>> magnitude. We're going to be watching very carefully and doing what we
>>> can to tune things better.
>>>
>>> -Andy-
>>>
>>> On 06/26/2016 09:46 AM, Andrew Grimberg wrote:
>>> > Status update:
>>> >
>>> > Disk management has been completed. Jenkins is online but we're still
>>> > working through the changes that need to happen after Jenkins is
>>> running
>>> > again.
>>> >
>>> > As such Jenkins is going to remain in the non-processing 'going to shut
>>> > down' mode until we have finished our changes.
>>> >
>>> > Current estimate still puts us at ~12:00 PDT before we reopen Jenkins
>>> > for proper service.
>>> >
>>> > -Andy-
>>> >
>>> > On 06/26/2016 06:04 AM, Andrew Grimberg wrote:
>>> >> Greetings folks,
>>> >>
>>> >> Just an update on the outage. Yes, we're still down, but we're finally
>>> >> into the home stretch of disk changes before we can restart Jenkins
>>> and
>>> >> then apply the needed job changes related to the Jenkins updates.
>>> >>
>>> >> With the present rate at which the related disk changes are happening
>>> I
>>> >> am presently anticipating that we'll have it back online by 12:00 PDT
>>> >> today (6 hours out).
>>> >>
>>> >> My apologies for the longer outage duration than originally planned!
>>> >> -Andy-
>>> >>
>>> >> On 06/25/2016 04:55 PM, Andrew Grimberg wrote:
>>> >>> Original window is about to close but we still haven't completed the
>>> >>> migration. From the look of things we're about 1 - 2 hours away from
>>> the
>>> >>> finalized disk sync finishing. Giving how long we've been down and
>>> that
>>> >>> we're likely to see similar lengths of time for an attempt at a later
>>> >>> date, we're just going to go ahead and power through.
>>> >>>
>>> >>> Sorry for the extended outage!
>>> >>>
>>> >>> -Andy-
>>> >>>
>>> >>> On 06/25/2016 07:45 AM, Andrew Grimberg wrote:
>>> >>>> This work will be starting in 15 minutes.
>>> >>>>
>>> >>>> -Andy-
>>> >>>>
>>> >>>> On 06/23/2016 09:31 AM, Andrew Grimberg wrote:
>>> >>>>> What: The Linux Foundation will be performing the final migration
>>> of the
>>> >>>>> OpenDaylight Jenkins releng silo (aka production silo) to from the
>>> >>>>> Rackspace public cloud to private cloud
>>> >>>>>
>>> >>>>> When: Saturday, June 25, 2016 @ 08:00 - 17:00 PDT (15:00 - 00:00
>>> UTC)
>>> >>>>>
>>> >>>>> Why: This is the final step in our migration from the Rackspace
>>> public
>>> >>>>> cloud to the private cloud.
>>> >>>>>
>>> >>>>> Impact: The production Jenkins system will be offline for the
>>> duration
>>> >>>>> of the migration. The size of the window is needed to account for
>>> final
>>> >>>>> data synchronization of the current jenkins silo to the new new one
>>> >>>>> which consists of ~2.3TB of data.
>>> >>>>>
>>> >>>>> Additionally, while we're doing the final disk sync we will take
>>> the
>>> >>>>> time to do needed system updates on other components of the CI
>>> >>>>> infrastructure. As such, there will be some rolling outages of
>>> Gerrit,
>>> >>>>> Nexus, Sonar, and CLM.
>>> >>>>>
>>> >>>>> We will be sending out a note to the lists and #opendaylight IRC
>>> channel
>>> >>>>> on Freenode at the beginning and end of the maintenance.
>>> >>>>>
>>> >>>>> -Andy-
>>> >>>>>
>>> >>>>
>>> >>>
>>> >>
>>> >
>>>
>>> --
>>> Andrew J Grimberg
>>> Systems Administrator
>>> Release Engineering Team Lead
>>> The Linux Foundation
>>>
>>>
>>> _______________________________________________
>>> release mailing list
>>> [email protected]
>>> https://lists.opendaylight.org/mailman/listinfo/release
>>>
>>>
>>
>>
>> --
>> Thanks
>> Anil
>>
>
>
>
> --
> Thanks
> Anil
>
> _______________________________________________
> release mailing list
> [email protected]
> https://lists.opendaylight.org/mailman/listinfo/release
>
>
_______________________________________________
Discuss mailing list
[email protected]
https://lists.opendaylight.org/mailman/listinfo/discuss
  • Re: [OpenDay... Andrew Grimberg
    • Re: [Op... Andrew Grimberg
      • Re:... Andrew Grimberg
        • ... Andrew Grimberg
          • ... Andrew Grimberg
            • ... Luis Gomez
              • ... Anil Vishnoi
            • ... Anil Vishnoi
              • ... Anil Vishnoi
              • ... Andrej Leitner -X (anleitne - PANTHEON TECHNOLOGIES at Cisco)
              • ... Thanh Ha
              • ... Thanh Ha

Reply via email to