roywei commented on issue #15152: [CI][nightly] nightly test tutorial failure: test_tutorials.test_python_kvstore URL: https://github.com/apache/incubator-mxnet/issues/15152#issuecomment-499193612 root cause is num of GPUs changed from 2 to 1. `NODE_LINUX_GPU` is G3.8x wiht 2 GPUs and `NODE_LINUX_GPU_P3` is P3.2x with 1 GPU so ``` contexts =[mx.gpu(0), mx.gpu(1)] -> [mx.gpu(0)] ``` b length changed from 2 to 1 `b = [mx.nd.ones(shape, ctx) for ctx in contexts]` It seems in kvstore, when pushing a list, only len list lenght >1 , aggregation happens, and everythign will be on the same context. But when lenght = 1, it won't happen, causing update with ndarray on different context failed. If aggregation happens, everything will be on the same context when update. Everything below works ``` b = [mx.nd.ones(shape, mx.cpu(0)), mx.nd.ones(shape, mx.gpu(2))] kv.push(3, mx.nd.ones(shape)) ``` ``` b = [mx.nd.ones(shape, mx.cpu(0)), mx.nd.ones(shape, mx.gpu(2))] kv.push(3, mx.nd.ones(shape)) ``` ``` b = [mx.nd.ones(shape, mx.cpu(0)), mx.nd.ones(shape, mx.gpu(2))] kv.push(3, mx.nd.ones(shape, mx.gpu(0)) ``` When length is 1, user need to make sure everything on the same context. Reproduciable Scripts: ``` import mxnet as mx kv = mx.kv.create('local') # create a local kv store. shape = (2,3) kv.init(3, mx.nd.ones(shape)*2) a = mx.nd.zeros(shape) kv.pull(3, out = a) print(a.asnumpy()) kv.push(3, mx.nd.ones(shape)*8) kv.pull(3, out = a) # pull out the value print(a.asnumpy()) # The numbers used below assume 4 GPUs gpus = 1 if gpus > 0: contexts = [mx.gpu(i) for i in range(gpus)] else: contexts = [mx.cpu(i) for i in range(4)] b = [mx.nd.ones(shape, ctx) for ctx in contexts] kv.push(3, b) kv.pull(3, out = a) print(a.asnumpy()) def update(key, input, stored): print("update on key: %d" % key) stored += input * 2 kv._set_updater(update) kv.pull(3, out=a) print(a.asnumpy()) kv.push(3, mx.nd.ones(shape, mx.gpu(0))) # kv.pull(3, out=a) print(a.asnumpy()) ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
