Hello MXNet Community, Since a week, CI is blocked due to Windows-GPU failure. PR to fix it is still WIP : https://github.com/apache/incubator-mxnet/pull/17808
This updates the toolchain from 32bit to 64bit [to resolve the 2GB memory linker error currently facing CI] Along with host of other updates that are long time coming - [VSCode2019,opencv,cudnn,etc] We have 2 pending issues: 1. cuda segfault in Py3 Windows GPU test OSError: exception: access violation writing 0x0000000000000000 2. Jenkins Channel Connection "hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@5cca06e6:JNLP4-connect connection from [...] failed. The channel is closing down or has closed down" We are hard at work to unblock the CI & get the PR fix merged. Since we want to focus on fixing the windows-gpu issue and avoid complicating the situation further, we are not disabling the windows-gpu build as of now. As a backup plan, we will disable the windows-gpu builds by 4/5 Sunday EOD if things don’t recover by then. Thanks for the continued patience. Chai, on behalf of the MXNet CI team On Thu, 26 Mar 2020 at 21:16, Chaitanya Bapat <chai.ba...@gmail.com> wrote: > Hello MXNet community, > > It’s been over 3 days now that windows-gpu builds are failing on CI. > The team (me, Leo, Ningyuan, Joe, Pedro) are at work trying to identify > the root-cause and fix. > > Issue: Linker is running OOM due to 32bit toolchain not able to address > the available memory of the machine. > > Multiple attempts have been made (albeit with limited success) > 1. Reduce the number of builds per worker (for window-cpu node) from 3 to 1 > 2. Updated the toolchain from 32bit to 64bit (as pointed out by multiple > people) > PR : https://github.com/apache/incubator-mxnet/pull/17916 > [related to Leo’s PR : > https://github.com/apache/incubator-mxnet/pull/17912) > > Road to unblock: > Updated AMI coupled with toolchain should possibly help > Ningyuan has an updated AMI for windows (PR : > https://github.com/apache/incubator-mxnet/pull/17808) - vs2019, cuda10.2, > cmake fixes etc. > > We will get it deployed by tomorrow and update the status accordingly. > > Thanks for the patience. Apologies for the inconvenience caused. > Thank you 🙏 > Chai, > on behalf of the MXNet CI team > > -- > *Chaitanya Prakash Bapat* > *+1 (973) 953-6299* > > [image: https://www.linkedin.com//in/chaibapat25] > <https://github.com/ChaiBapchya>[image: > https://www.facebook.com/chaibapat] > <https://www.facebook.com/chaibapchya>[image: > https://twitter.com/ChaiBapchya] <https://twitter.com/ChaiBapchya>[image: > https://www.linkedin.com//in/chaibapat25] > <https://www.linkedin.com//in/chaibapchya/> > -- *Chaitanya Prakash Bapat* *+1 (973) 953-6299* [image: https://www.linkedin.com//in/chaibapat25] <https://github.com/ChaiBapchya>[image: https://www.facebook.com/chaibapat] <https://www.facebook.com/chaibapchya>[image: https://twitter.com/ChaiBapchya] <https://twitter.com/ChaiBapchya>[image: https://www.linkedin.com//in/chaibapat25] <https://www.linkedin.com//in/chaibapchya/>