[MXNet Forum] Synchronization and Update Function of Parameter Server (distributed KV-Store)

MaxiBoether via MXNet Forum Wed, 20 Jan 2021 01:18:52 -0800

Hi,


I've got two questions about the distributed KV-store 
(`mx.kv.create('dist_sync')`)  available in MXNet.

1) The documentation states that pushes return immediatly and are executed 
asynchronously. Then it states we can use `_barrier()` to sync all workers. 
This however is discussed in the context of asynchronous execution. 

I do not fully understand what "syncing all workers" means here. Workers are 
not parameter servers (schedulers or servers); instead, they are the one 
executing the control flow. So, if I'm not training a neural network, but just 
using the parameter server (which is what I want to do for testing purposes), 
the worker is the one calling `kv.push`. How can the server then sync all 
workers? The server cannot push directly to the worker, can it? The worker can 
only pull from or push to the server. I imagine that calling _barrier() 
force-pulls at all clients, but imagine we do something like in the 
documentation:

```
        >>> # push a list of keys.
        >>> # single device
        >>> keys = ['4', '5', '6']
        >>> kv.push(keys, [mx.nd.ones(shape)]*len(keys))
        >>> b = [mx.nd.zeros(shape)]*len(keys)
        >>> kv.pull(keys, out=b)
        >>> print b[1].asnumpy()
        [[ 1.  1.  1.]
        [ 1.  1.  1.]]
```

I do not see how the kvstore knows which variable to update on all workers when 
_barrier() is called.

Secondly, I wonder about push guarantees. Is there also a barrier that forces 
completion of all pending operations? Because again, _barrier() seems to be 
about workers, not servers. I want to benchmark parameter server primitives and 
therefore, I need to know that an operation is finished.

2) We can use `set_updater` locally, but that will have no effect on the 
aggregation of the workers. However, there is the `set_optimizer` function, but 
that is on another level of abstraction. If I want to define my own update 
function like in the documentation 
(https://mxnet.apache.org/versions/1.7.0/api/python/docs/tutorials/packages/kvstore/kvstore.html?highlight=distributed):

```
def update(key, input, stored):
    print("update on key: %d" % key)
    stored += input * 2
```

I can't figure out how to use this in the distributed KV-store context, as 
`kv._set_updater(update)` only works for local KV-stores, but `set_optimizer` 
required inbuilt optimizers like SGD.

Thank you very much for your help!

Best,
Maximilian





---
[Visit 
Topic](https://discuss.mxnet.apache.org/t/synchronization-and-update-function-of-parameter-server-distributed-kv-store/6823/1)
 or reply to this email to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.mxnet.apache.org/email/unsubscribe/b3057be9d6ca28435a5107e7df54a2a367d6b938f840f4addba1e5be8312d83d).

[MXNet Forum] Synchronization and Update Function of Parameter Server (distributed KV-Store)

Reply via email to