[
https://issues.apache.org/jira/browse/SINGA-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14693236#comment-14693236
]
ASF subversion and git services commented on SINGA-47:
------------------------------------------------------
Commit 7a61a687c2ceb4fc7e05c2d3bbd9817e8ba59e3f in incubator-singa's branch
refs/heads/master from Wei Wang
[ https://git-wip-us.apache.org/repos/asf?p=incubator-singa.git;h=7a61a68 ]
SINGA-47 Fix a bug in data layers that leads to out-of-memory when group size
is too large
The bug is fixed by closing the data source (e.g., lmdb or datashard) after
reading a sample record in the Setup function.
The data source would cacahe memory which eat up all memory if there are many
data layers.
> Fix a bug in data layers that leads to out-of-memory when group size is too
> large
> ----------------------------------------------------------------------------------
>
> Key: SINGA-47
> URL: https://issues.apache.org/jira/browse/SINGA-47
> Project: Singa
> Issue Type: Bug
> Reporter: wangwei
>
> The Setup function of a data layer opens the database (e.g., DataShard or
> LMDB) and reads a sample record. The sample record is necessary for setting
> upper layers' data shape. Every data layer's Setup function is called when
> SINGA creates the NeuralNet object. If there the group size is 128 and
> partitioning is on dimension 0, then 128 data layers will be created. The
> memory would be used up if the database object has large cache (prefetch)
> size.
> Although every process has the full NeuralNet object, i.e., all layers. Each
> process has a subset of workers which run over a subset of (data) layers.
> Consequently, in one process, only a small number of data layers will call
> ComputeFeature to read data records.
> To fix the bug, we just close the database after reading one sample record in
> Setup function, and re-open it in ComputeFeature function. In this way, only
> a smaller number of database instances are open in each process.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)