wangwei created SINGA-122:
-----------------------------

             Summary: Update Param class on memory space controlling
                 Key: SINGA-122
                 URL: https://issues.apache.org/jira/browse/SINGA-122
             Project: Singa
          Issue Type: Improvement
            Reporter: wangwei


In deep learning models, some layers would share one or more Param objects, 
e.g., the en-coder and de-coder in a auto-encoder model would share weight 
matrix, and RNN units (layers) would share their all parameters. For parallel 
training with replicated layers, these layers share parameters. 

It is necessary to optimise the memory space of these shared parameters. 
Otherwise, we have to allocate both memory space for the parameter values and 
gradients for each share. It would consume a lot of memory, especially for the 
RNN model case and the parallel training case (there would be more than 10 
shares for one Param object).

To minimise the memory footprint, there are three levels,
1. share memory space CPU values, e.g., one Param object is replicated on 
different GPU cards.
2. share memory space for both CPU values and GPU values
3. share memory space for both values and gradients.
In terms of memory footprint, 1 > 2 > 3. 
for level 1 and 2, the code for computing gradients is transparent to parameter 
sharing. However, for case 3, the code must handle the gradient aggregation 
correctly. Otherwise, the gradients computed for one share would be overwritten 
by others.

We need to update both the Param class and NeuralNet class (to decide the 
sharing case for Param objects). Generally, the NeuralNet class creates Param 
objects and determine the sharing level (together with user configuration). 
Layer::ComputeGradient assign or aggregate gradients based on the sharing level 
(e.g., a flag). 
Details will be updated later..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to