zhiics edited a comment on issue #4646: [TEST][FLAKY] topi/tests/python/test_topi_depthwise_conv2d_back_weight.py URL: https://github.com/apache/incubator-tvm/issues/4646#issuecomment-571933507 I tested it locally and found that the problem is caused by the tests with large input size introduced in #4511. With these tests: https://github.com/apache/incubator-tvm/blob/bc0274d307226408c69226cf922dd916d773e265/topi/tests/python/test_topi_conv2d_NCHWc.py#L232 https://github.com/apache/incubator-tvm/blob/bc0274d307226408c69226cf922dd916d773e265/topi/tests/python/test_topi_conv2d_int8.py#L200 https://github.com/apache/incubator-tvm/blob/bc0274d307226408c69226cf922dd916d773e265/topi/tests/python/test_topi_conv2d_nchw.py#L199 the G4 xlarge instance would run out of memory but may not be the same case for other instances. This probably also explain why it was flaky because some of the CI GPU instance are P2 instances (which has at least 60G RAM https://aws.amazon.com/ec2/instance-types/p2/). <img width="1440" alt="Screen Shot 2020-01-07 at 9 20 03 PM" src="https://user-images.githubusercontent.com/5145158/71960250-614ea700-31a9-11ea-92cf-f9d48eb30085.png"> This could be reproduced through checking out the docker image and running `docker/bash.sh tvmai/ci-gpu:v0.56 tests/scripts/task_python_topi.sh` with the same config.cmake used in the CI. I tried to reduce the size of these tests (we probably don't want to have such large input for unit test?) and it turned out the failures are gone. Now it would use around 8G RAM (it was around 7G before this PR if I remember correctly). Hopefully this would solve the problem.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
