roywei commented on issue #15152: [CI][nightly] nightly test tutorial failure: 
test_tutorials.test_python_kvstore
URL: 
https://github.com/apache/incubator-mxnet/issues/15152#issuecomment-499193612
 
 
   root cause is num of GPUs changed from 2 to 1. `NODE_LINUX_GPU` is G3.8x 
wiht 2 GPUs and `NODE_LINUX_GPU_P3` is P3.2x with 1 GPU
   
   so
   ```
   contexts =[mx.gpu(0), mx.gpu(1)]  ->  [mx.gpu(0)] 
   ```
   b  length changed  from 2 to 1
   `b = [mx.nd.ones(shape, ctx) for ctx in contexts]`
   
   It seems in kvstore, when pushing a list, only len list lenght >1 , 
aggregation happens, and everythign will be on the same context. But when 
lenght = 1, it won't happen, causing update with ndarray on different context 
failed. If aggregation happens, everything will be on the same context when 
update.
   Everything below works
   ```
   b = [mx.nd.ones(shape, mx.cpu(0)), mx.nd.ones(shape, mx.gpu(2))]
   kv.push(3, mx.nd.ones(shape))
   ```
   
   ```
   b = [mx.nd.ones(shape, mx.cpu(0)), mx.nd.ones(shape, mx.gpu(2))]
   kv.push(3, mx.nd.ones(shape))
   ```
   
   ```
   b = [mx.nd.ones(shape, mx.cpu(0)), mx.nd.ones(shape, mx.gpu(2))]
   kv.push(3, mx.nd.ones(shape, mx.gpu(0))
   ```
   
   When length is 1, user need to make sure everything on the same context.
   Reproduciable Scripts:
   ```
   import mxnet as mx
   
   kv = mx.kv.create('local') # create a local kv store.
   shape = (2,3)
   kv.init(3, mx.nd.ones(shape)*2)
   a = mx.nd.zeros(shape)
   kv.pull(3, out = a)
   print(a.asnumpy())
   kv.push(3, mx.nd.ones(shape)*8)
   kv.pull(3, out = a) # pull out the value
   print(a.asnumpy())
   
   # The numbers used below assume 4 GPUs
   gpus = 1
   if gpus > 0:
       contexts = [mx.gpu(i) for i in range(gpus)]
   else:
       contexts = [mx.cpu(i) for i in range(4)]
   
   b = [mx.nd.ones(shape, ctx) for ctx in contexts]
   kv.push(3, b)
   kv.pull(3, out = a)
   print(a.asnumpy())
   
   def update(key, input, stored):
       print("update on key: %d" % key)
       stored += input * 2
   
   
   kv._set_updater(update)
   kv.pull(3, out=a)
   print(a.asnumpy())
   
   kv.push(3, mx.nd.ones(shape, mx.gpu(0)))
   #
   kv.pull(3, out=a)
   print(a.asnumpy())
   
   
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to