cjolivier01 commented on a change in pull request #7903: Refactor AdaGrad optimizer to support sparse tensors URL: https://github.com/apache/incubator-mxnet/pull/7903#discussion_r139756939
########## File path: python/mxnet/optimizer.py ########## @@ -665,26 +667,46 @@ class AdaGrad(Optimizer): eps: float, optional Small value to avoid division by 0. """ - def __init__(self, eps=1e-7, **kwargs): + def __init__(self, eps=1e-7, stype='default', **kwargs): super(AdaGrad, self).__init__(**kwargs) self.float_stable_eps = eps + self.stype = stype def create_state(self, index, weight): - return zeros(weight.shape, weight.context) # history + return zeros(weight.shape, weight.context, stype=self.stype) # history def update(self, index, weight, grad, state): + #print("ENTER ADAGRAD UPDATE") assert(isinstance(weight, NDArray)) assert(isinstance(grad, NDArray)) self._update_count(index) lr = self._get_lr(index) wd = self._get_wd(index) - + save_grad_stype = grad.stype grad = grad * self.rescale_grad if self.clip_gradient is not None: grad = clip(grad, -self.clip_gradient, self.clip_gradient) history = state - history[:] += (grad * grad) - weight[:] += -lr * (grad / sqrt(history + self.float_stable_eps) + wd * weight) + save_history_stype = history.stype + + is_sparse = True if weight.stype != 'default' or grad.stype != 'default' else False Review comment: many of these ops support both sparse and dense input combinations (and handle them in an efficient manner without fallback). not to say it's the most efficient way to do it, but it's legal. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services