SINGA-107 Error from loading pre-trained params for training stacked RBMs

    Description:
When Params are loaded from checkpoint files, their version numbers will be 
reset to 0 for fine-tuning as explained in the comments of SINGA-42.
However, if these parameters are not fine-tuned (For example, in 
https://github.com/apache/incubator-singa/tree/master/examples/rbm, in RBM2, 
the parameters from RBM1 are not updated), then these parameters' versions 
would be 0 when they are dumped into the checkpoint files. When these 
parameters are loaded again for training other models, their versions are 0, 
hence they should be initialized again according to SINGA-42. In other words, 
the pre-training is useless.

Currently solution is loading the checkpoint file where each Param is first 
dumped, so that the latter (correct) Param can override the in-correct Param. 
Consequently, the version number will not be 0.
For example, in 
https://github.com/apache/incubator-singa/tree/master/examples/rbm/rbm3.conf , 
we configure the checkpoint files as:

checkpoint_path: "examples/rbm/rbm2/checkpoint/step6000-worker0"
checkpoint_path: "examples/rbm/rbm1/checkpoint/step6000-worker0"

in order to load w1 and b12 correctly.


Project: http://git-wip-us.apache.org/repos/asf/incubator-singa/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-singa/commit/f16b1be6
Tree: http://git-wip-us.apache.org/repos/asf/incubator-singa/tree/f16b1be6
Diff: http://git-wip-us.apache.org/repos/asf/incubator-singa/diff/f16b1be6

Branch: refs/heads/master
Commit: f16b1be6f1d30f3ad3554c52359a69c2f643cd61
Parents: e8d01dc
Author: zhaojing <[email protected]>
Authored: Mon Dec 7 16:22:05 2015 +0800
Committer: zhaojing <[email protected]>
Committed: Tue Dec 8 11:52:48 2015 +0800

----------------------------------------------------------------------
 examples/rbm/autoencoder.conf | 6 +++---
 examples/rbm/rbm3.conf        | 1 +
 examples/rbm/rbm4.conf        | 2 ++
 3 files changed, 6 insertions(+), 3 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-singa/blob/f16b1be6/examples/rbm/autoencoder.conf
----------------------------------------------------------------------
diff --git a/examples/rbm/autoencoder.conf b/examples/rbm/autoencoder.conf
index b4dfc64..223ad0d 100644
--- a/examples/rbm/autoencoder.conf
+++ b/examples/rbm/autoencoder.conf
@@ -3,10 +3,10 @@ train_steps: 12200
 test_steps:100
 test_freq:1000
 disp_freq:100
-checkpoint_path: "examples/rbm/rbm1/checkpoint/step6000-worker0"
-checkpoint_path: "examples/rbm/rbm2/checkpoint/step6000-worker0"
-checkpoint_path: "examples/rbm/rbm3/checkpoint/step6000-worker0"
 checkpoint_path: "examples/rbm/rbm4/checkpoint/step6000-worker0"
+checkpoint_path: "examples/rbm/rbm3/checkpoint/step6000-worker0"
+checkpoint_path: "examples/rbm/rbm2/checkpoint/step6000-worker0"
+checkpoint_path: "examples/rbm/rbm1/checkpoint/step6000-worker0"
 train_one_batch{
   alg: kBP
 }

http://git-wip-us.apache.org/repos/asf/incubator-singa/blob/f16b1be6/examples/rbm/rbm3.conf
----------------------------------------------------------------------
diff --git a/examples/rbm/rbm3.conf b/examples/rbm/rbm3.conf
index 245cafc..44eae77 100644
--- a/examples/rbm/rbm3.conf
+++ b/examples/rbm/rbm3.conf
@@ -7,6 +7,7 @@ train_one_batch{
   alg: kCD
 }
 checkpoint_path: "examples/rbm/rbm2/checkpoint/step6000-worker0"
+checkpoint_path: "examples/rbm/rbm1/checkpoint/step6000-worker0"
 
 updater{
   type: kSGD

http://git-wip-us.apache.org/repos/asf/incubator-singa/blob/f16b1be6/examples/rbm/rbm4.conf
----------------------------------------------------------------------
diff --git a/examples/rbm/rbm4.conf b/examples/rbm/rbm4.conf
index cd4d40a..bb023c4 100644
--- a/examples/rbm/rbm4.conf
+++ b/examples/rbm/rbm4.conf
@@ -7,6 +7,8 @@ train_one_batch{
   alg: kCD
 }
 checkpoint_path: "examples/rbm/rbm3/checkpoint/step6000-worker0"
+checkpoint_path: "examples/rbm/rbm2/checkpoint/step6000-worker0"
+checkpoint_path: "examples/rbm/rbm1/checkpoint/step6000-worker0"
 updater{
     type: kSGD
     momentum: 0.8

Reply via email to