[ 
https://issues.apache.org/jira/browse/SINGA-122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wangwei updated SINGA-122:
--------------------------
    Summary: Optimize memory space for Param sharing  (was: Update Param class 
on memory space controlling)

> Optimize memory space for Param sharing
> ---------------------------------------
>
>                 Key: SINGA-122
>                 URL: https://issues.apache.org/jira/browse/SINGA-122
>             Project: Singa
>          Issue Type: Improvement
>            Reporter: wangwei
>
> In deep learning models, some layers would share one or more Param objects, 
> e.g., the en-coder and de-coder in a auto-encoder model would share weight 
> matrix, and RNN units (layers) would share their all parameters. For parallel 
> training with replicated layers, these layers share parameters. 
> It is necessary to optimise the memory space of these shared parameters. 
> Otherwise, we have to allocate both memory space for the parameter values and 
> gradients for each share. It would consume a lot of memory, especially for 
> the RNN model case and the parallel training case (there would be more than 
> 10 shares for one Param object).
> To minimise the memory footprint, there are three levels,
> 1. share memory space CPU values, e.g., one Param object is replicated on 
> different GPU cards.
> 2. share memory space for both CPU values and GPU values
> 3. share memory space for both values and gradients.
> In terms of memory footprint, 1 > 2 > 3. 
> for level 1 and 2, the code for computing gradients is transparent to 
> parameter sharing. However, for case 3, the code must handle the gradient 
> aggregation correctly. Otherwise, the gradients computed for one share would 
> be overwritten by others.
> We need to update both the Param class and NeuralNet class (to decide the 
> sharing case for Param objects). Generally, the NeuralNet class creates Param 
> objects and determine the sharing level (together with user configuration). 
> Layer::ComputeGradient assign or aggregate gradients based on the sharing 
> level (e.g., a flag). 
> Details will be updated later..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to