[jira] [Commented] (SINGA-248) bug in checkpoint size in vgg-16 model

wangwei (JIRA) Fri, 23 Sep 2016 08:02:38 -0700

    [ 
https://issues.apache.org/jira/browse/SINGA-248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15516698#comment-15516698
 ]


wangwei commented on SINGA-248:
-------------------------------

Thanks for reporting this issue.
net.save() uses numpy and cPickle to dump the model parameters (about 500MB), 
which use extra spaces.
Here is a simple experiment I did,
{code}
>>> import numpy as np
>>> import cPickle as pickle
>>> bigary = np.random.rand(20588 + 4096 + 1000, 4096)
>>> np.save('bigary', bigary)
>>> pickary = {}
>>> pickary['a'] = np.random.rand(20588, 4096)
>>> pickary['b'] = np.random.rand(4096, 4096)
>>> pickary['c'] = np.random.rand(1000, 4096)
>>> with open('pickary', 'wb') as fd:
...     pickle.dump(pickary, fd)
...
>>> quit()
(pysinga)wangwei@slave2:~/vgg16$ ls -lh
total 4.5G
803M Sep 23 22:51 bigary.npy
1.5G Sep 23 22:39 model.bin
2.2K Sep 23 22:38 net.py
2.2G Sep 23 22:53 pickary
{code}
bigary.npy, and pickary have similar size in memory (i.e. the number of float 
values), but the sizes of the disk files differ a lot.

One approach to reduce the checkpoint file size is to serialize the singa 
tensors into protobuf objects and use classes in io/ to dump them.
We would test it and update the code.

> bug in checkpoint size in  vgg-16 model
> ---------------------------------------
>
>                 Key: SINGA-248
>                 URL: https://issues.apache.org/jira/browse/SINGA-248
>             Project: Singa
>          Issue Type: Bug
>         Environment: ubuntu 14.04
>            Reporter: hacker99
>
> i created vgg-16 net, then saved it with python interface 
> (python/dragon/net.py) net.save('model.bin'),then find model.bin is about 
> 1.5GB.but same model in caffe just 528MB. can anyone may explain why?very 
> appreciate.
> vgg-16 code :
> from dragon import layer
> from dragon import initializer
> from dragon import metric
> from dragon import loss
> from dragon import net as ffnet
> def ConvReLU(net, name, nb_filers, sample_shape=None):
>     net.add(layer.Conv2D(name + '_1', nb_filers, 3, 1, pad=1,
>                          input_sample_shape=sample_shape))
>     net.add(layer.Activation(name + '_3'))
> def create_net(use_cpu=False):
>     if use_cpu:
>         layer.engine = 'dragoncpp'
>     net = ffnet.FeedForwardNet(loss.SoftmaxCrossEntropy(), metric.Accuracy())
>     ConvReLU(net, 'conv1_1', 64, (3, 224, 224))
>     ConvReLU(net, 'conv1_2', 64)
>     net.add(layer.MaxPooling2D('pool1', 2, 2, border_mode='valid'))
>     ConvReLU(net, 'conv2_1', 128)
>     ConvReLU(net, 'conv2_2', 128)
>     net.add(layer.MaxPooling2D('pool2', 2, 2, border_mode='valid'))
>     ConvReLU(net, 'conv3_1', 256)
>     ConvReLU(net, 'conv3_2', 256)
>     ConvReLU(net, 'conv3_3', 256)
>     net.add(layer.MaxPooling2D('pool3', 2, 2, border_mode='valid'))
>     ConvReLU(net, 'conv4_1', 512)
>     ConvReLU(net, 'conv4_2', 512)
>     ConvReLU(net, 'conv4_3', 512)
>     net.add(layer.MaxPooling2D('pool4', 2, 2, border_mode='valid'))
>     ConvReLU(net, 'conv5_1', 512)
>     ConvReLU(net, 'conv5_2', 512)
>     ConvReLU(net, 'conv5_3', 512)
>     net.add(layer.MaxPooling2D('pool5', 2, 2, border_mode='valid'))
>     net.add(layer.Flatten('flat'))
>     net.add(layer.Dense('ip1', 4096))
>     net.add(layer.Dropout('drop_ip1', 0.5))
>     net.add(layer.Activation('relu_ip1'))
>     net.add(layer.Dense('ip2', 4096))
>     net.add(layer.Activation('relu_ip2'))
>     net.add(layer.Dropout('drop_ip2', 0.5))
>     #net.add(layer.BatchNormalization('batchnorm_ip1'))
>     net.add(layer.Dense('ip3', 1000))
>     for (p, name) in zip(net.param_values(), net.param_names()):
>         print name, p.shape
>         if 'mean' in name or 'beta' in name:
>             p.set_value(0.0)
>         elif 'var' in name:
>             p.set_value(1.0)
>         elif 'gamma' in name:
>             initializer.uniform(p, 0, 1)
>         elif len(p.shape) > 1:
>             if 'conv' in name:
>                 initializer.gaussian(p, 0, 3 * 3 * p.shape[0])
>             else:
>                 p.gaussian(0, 0.02)
>         else:
>             p.set_value(0)
>         print name, p.l1()
>     return net



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SINGA-248) bug in checkpoint size in vgg-16 model

Reply via email to