chrishkchris commented on a change in pull request #468: Distributted module
URL: https://github.com/apache/incubator-singa/pull/468#discussion_r311068821
 
 

 ##########
 File path: src/api/config.i
 ##########
 @@ -0,0 +1,33 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+
+
+// Pass in cmake configurations to swig
+#define USE_CUDA 1
+#define USE_CUDNN 1
+#define USE_OPENCL 0
+#define USE_PYTHON 1
+#define USE_MKLDNN 1
+#define USE_JAVA 0
+#define CUDNN_VERSION 7401
+
+// SINGA version
+#define SINGA_MAJOR_VERSION 1
 
 Review comment:
   In additional to the above, I also did a 8 * K80 multi-GPUs training and 
evaluation test with a CIFAR-10 dataset on resnet 50. It reduces the training 
loss from 3983.8 to 345.7 in about 30 Epochs, and evaluation accuracy to 86.8%. 
However, this does not include the synchronization of running mean and variance 
before the evaluation phase:
   ```
   Epoch=0: 100%|██████████| 195/195 [06:06<00:00,  1.91s/it]Training loss = 
3983.820557, training accuracy = 0.225260
   Test accuracy = 0.347556
   Epoch=1: 100%|██████████| 195/195 [06:17<00:00,  1.94s/it]Training loss = 
2628.622070, training accuracy = 0.379768
   Test accuracy = 0.437700
   Epoch=2: 100%|██████████| 195/195 [06:12<00:00,  1.89s/it]Training loss = 
2347.072266, training accuracy = 0.448558
   Test accuracy = 0.459936
   Epoch=3: 100%|██████████| 195/195 [06:13<00:00,  1.88s/it]Training loss = 
2075.987305, training accuracy = 0.517348
   Test accuracy = 0.548978
   Epoch=4: 100%|██████████| 195/195 [06:19<00:00,  1.97s/it]Training loss = 
1890.109985, training accuracy = 0.566847
   Test accuracy = 0.594451
   Epoch=5: 100%|██████████| 195/195 [06:13<00:00,  1.92s/it]Training loss = 
1720.395142, training accuracy = 0.606911
   Test accuracy = 0.633413
   Epoch=6: 100%|██████████| 195/195 [06:10<00:00,  1.92s/it]Training loss = 
1555.737549, training accuracy = 0.645753
   Test accuracy = 0.659054
   Epoch=7: 100%|██████████| 195/195 [06:14<00:00,  1.91s/it]Training loss = 
1385.688477, training accuracy = 0.687220
   Test accuracy = 0.709836
   Epoch=8: 100%|██████████| 195/195 [06:20<00:00,  1.97s/it]Training loss = 
1269.426270, training accuracy = 0.714523
   Test accuracy = 0.735477
   Epoch=9: 100%|██████████| 195/195 [06:15<00:00,  1.91s/it]Training loss = 
1137.953979, training accuracy = 0.746054
   Test accuracy = 0.745393
   Epoch=10: 100%|██████████| 195/195 [06:11<00:00,  1.88s/it]Training loss = 
1031.773071, training accuracy = 0.770353
   Test accuracy = 0.750501
   Epoch=11: 100%|██████████| 195/195 [06:10<00:00,  1.89s/it]Training loss = 
956.600037, training accuracy = 0.788261
   Test accuracy = 0.777744
   Epoch=12: 100%|██████████| 195/195 [06:16<00:00,  1.92s/it]Training loss = 
881.050171, training accuracy = 0.804167
   Test accuracy = 0.793369
   Epoch=13: 100%|██████████| 195/195 [06:16<00:00,  1.92s/it]Training loss = 
828.298828, training accuracy = 0.818309
   Test accuracy = 0.807692
   Epoch=14: 100%|██████████| 195/195 [06:11<00:00,  1.90s/it]Training loss = 
790.558838, training accuracy = 0.823918
   Test accuracy = 0.795373
   Epoch=15: 100%|██████████| 195/195 [06:13<00:00,  1.90s/it]Training loss = 
740.679871, training accuracy = 0.833734
   Test accuracy = 0.816707
   Epoch=16: 100%|██████████| 195/195 [06:20<00:00,  1.95s/it]Training loss = 
691.391479, training accuracy = 0.846855
   Test accuracy = 0.818510
   Epoch=17: 100%|██████████| 195/195 [06:16<00:00,  1.89s/it]Training loss = 
657.708130, training accuracy = 0.853986
   Test accuracy = 0.826122
   Epoch=18: 100%|██████████| 195/195 [06:10<00:00,  1.88s/it]Training loss = 
627.918579, training accuracy = 0.860216
   Test accuracy = 0.844752
   Epoch=19: 100%|██████████| 195/195 [06:13<00:00,  1.91s/it]Training loss = 
592.768982, training accuracy = 0.869551
   Test accuracy = 0.845653
   Epoch=20: 100%|██████████| 195/195 [06:19<00:00,  1.97s/it]Training loss = 
561.560608, training accuracy = 0.875060
   Test accuracy = 0.835938
   Epoch=21: 100%|██████████| 195/195 [06:15<00:00,  1.97s/it]Training loss = 
533.083740, training accuracy = 0.881370
   Test accuracy = 0.849860
   Epoch=22: 100%|██████████| 195/195 [06:12<00:00,  1.91s/it]Training loss = 
508.004578, training accuracy = 0.885056
   Test accuracy = 0.833434
   Epoch=23: 100%|██████████| 195/195 [06:12<00:00,  1.92s/it]Training loss = 
477.516602, training accuracy = 0.892488
   Test accuracy = 0.858474
   Epoch=24: 100%|██████████| 195/195 [06:20<00:00,  1.96s/it]Training loss = 
455.839996, training accuracy = 0.896595
   Test accuracy = 0.867388
   Epoch=25: 100%|██████████| 195/195 [06:16<00:00,  1.95s/it]Training loss = 
434.568390, training accuracy = 0.904327
   Test accuracy = 0.858774
   Epoch=26: 100%|██████████| 195/195 [06:10<00:00,  1.87s/it]Training loss = 
414.232391, training accuracy = 0.907071
   Test accuracy = 0.833333
   Epoch=27: 100%|██████████| 195/195 [06:13<00:00,  1.87s/it]Training loss = 
400.625458, training accuracy = 0.909275
   Test accuracy = 0.858974
   Epoch=28: 100%|██████████| 195/195 [06:20<00:00,  1.95s/it]Training loss = 
378.750885, training accuracy = 0.914443
   Test accuracy = 0.865885
   Epoch=29: 100%|██████████| 195/195 [06:14<00:00,  1.91s/it]Training loss = 
369.449249, training accuracy = 0.917548
   Test accuracy = 0.871394
   Epoch=30: 100%|██████████| 195/195 [06:13<00:00,  1.93s/it]Training loss = 
345.693939, training accuracy = 0.921935
   Test accuracy = 0.868389
   ```
   The code used is as below:
   ```
   #
   # Licensed to the Apache Software Foundation (ASF) under one
   # or more contributor license agreements.  See the NOTICE file
   # distributed with this work for additional information
   # regarding copyright ownership.  The ASF licenses this file
   # to you under the Apache License, Version 2.0 (the
   # "License"); you may not use this file except in compliance
   # with the License.  You may obtain a copy of the License at
   #
   #   http://www.apache.org/licenses/LICENSE-2.0
   #
   # Unless required by applicable law or agreed to in writing,
   # software distributed under the License is distributed on an
   # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
   # KIND, either express or implied.  See the License for the
   # specific language governing permissions and limitations
   # under the License.
   #
   
   try:
       import pickle
   except ImportError:
       import cPickle as pickle
       
   from singa import singa_wrap as singa
   from singa import autograd
   from singa import tensor
   from singa import device
   from singa import opt
   import cv2
   #from scipy import misc
   import numpy as np
   from tqdm import trange
   
   def load_dataset(filepath):
       print('Loading data file %s' % filepath)
       with open(filepath, 'rb') as fd:
           try:
               cifar10 = pickle.load(fd, encoding='latin1')
           except TypeError:
               cifar10 = pickle.load(fd)
       image = cifar10['data'].astype(dtype=np.uint8)
       image = image.reshape((-1, 3, 32, 32))
       label = np.asarray(cifar10['labels'], dtype=np.uint8)
       label = label.reshape(label.size, 1)
       return image, label
   
   
   def load_train_data(dir_path='cifar-10-batches-py', num_batches=5):
       labels = []
       batchsize = 10000
       images = np.empty((num_batches * batchsize, 3, 32, 32), dtype=np.uint8)
       for did in range(1, num_batches + 1):
           fname_train_data = dir_path + "/data_batch_{}".format(did)
           image, label = load_dataset(fname_train_data)
           images[(did - 1) * batchsize:did * batchsize] = image
           labels.extend(label)
       images = np.array(images, dtype=np.float32)
       labels = np.array(labels, dtype=np.int32)
       return images, labels
   
   
   def load_test_data(dir_path='cifar-10-batches-py'):
       images, labels = load_dataset(dir_path + "/test_batch")
       return np.array(images,  dtype=np.float32), np.array(labels, 
dtype=np.int32)
   
   def normalize_for_resnet(train_x, test_x):   
       mean=[0.4914, 0.4822, 0.4465]
       std=[0.2023, 0.1994, 0.2010] 
       train_x /= 255
       test_x /= 255
       for ch in range(0,2):
           train_x[:, ch, :, :] -= mean[ch]
           train_x[:, ch, :, :] /= std[ch]
           test_x[:, ch, :, :] -= mean[ch]
           test_x[:, ch, :, :] /= std[ch]
       return train_x, test_x
   
   def resize_dataset(x,IMG_SIZE):
       num_data = x.shape[0]
       dim = x.shape[1]
       X = np.zeros(shape=(num_data,dim,IMG_SIZE,IMG_SIZE), dtype=np.float32)
       for n in range(0,num_data):
           for d in range(0,dim):
               X[n, d, :, :] = cv2.resize(x[n , d, : ,:], 
(IMG_SIZE,IMG_SIZE)).astype(np.float32)
       return X
   
   def augmentation(x, batch_size):
       xpad = np.pad(x, [[0, 0], [0, 0], [4, 4], [4, 4]], 'symmetric')
       for data_num in range(0, batch_size):
           offset = np.random.randint(8, size=2)
           x[data_num,:,:,:] = xpad[data_num, :, offset[0]: offset[0] + 32, 
offset[1]: offset[1] + 32]
           if_flip = np.random.randint(2)
           if (if_flip):
               x[data_num, :, :, :] = x[data_num, :, :, ::-1]
       return x
   
   def accuracy(pred, target):
       y = np.argmax(pred, axis=1)
       t = np.argmax(target, axis=1)
       a = y == t
       return np.array(a, "int").sum()
   
   def to_categorical(y, num_classes):
       y = np.array(y, dtype="int")
       n = y.shape[0]
       categorical = np.zeros((n, num_classes))
       for i in range(0,n):
         categorical[i, y[i]] = 1
         categorical = categorical.astype(np.float32)
       return categorical
   
   def data_partition(dataset_x, dataset_y, rank_in_global, world_size):
       data_per_rank = dataset_x.shape[0] // world_size
       idx_start = rank_in_global * data_per_rank
       idx_end = (rank_in_global + 1) * data_per_rank
       return dataset_x[idx_start: idx_end], dataset_y[idx_start: idx_end]
   
   def sychronize(tensor, dist_opt):
       singa.synch(tensor.data, dist_opt.communicator)
       # cannot use tensor/=dist_opt.world_size because "/=" not in place, but 
"-=" is in place
       tensor -= (dist_opt.world_size - 1) * tensor / dist_opt.world_size    
   
   if __name__ == '__main__':
   
   
       sgd = opt.SGD(lr=0.04, momentum=0.9, weight_decay=1e-5)
       sgd = opt.DistOpt(sgd)
   
       #load dataset
       #need to download with "/python3 
incubator-singa/examples/cifar10/download_data.py py"
       train_x, train_y = load_train_data()
       test_x, test_y = load_test_data()
       train_x, test_x = normalize_for_resnet(train_x, test_x)
       train_x, train_y = data_partition(train_x, train_y, sgd.rank_in_global, 
sgd.world_size)
       test_x, test_y = data_partition(test_x, test_y, sgd.rank_in_global, 
sgd.world_size)
   
       num_classes=10
   
       from resnet import resnet50
       model = resnet50(num_classes=num_classes)
   
       print('Start intialization............')
       dev = device.create_cuda_gpu_on(sgd.rank_in_local)
   
       max_epoch = 100
       batch_size = 32
       IMG_SIZE = 224
       tx = tensor.Tensor((batch_size, 3, IMG_SIZE, IMG_SIZE), dev, 
tensor.float32)
       ty = tensor.Tensor((batch_size,), dev, tensor.int32)
       num_train_batch = train_x.shape[0] // batch_size
       num_test_batch = test_x.shape[0] // batch_size
       idx = np.arange(train_x.shape[0], dtype=np.int32)
       reducer = tensor.Tensor((1,), dev, tensor.float32)
   
       #allreduce the initialize parameter
       autograd.training = True
       #x = np.zeros(shape=[batch_size, 3, IMG_SIZE, IMG_SIZE], 
dtype=np.float32)
       #y = np.zeros(shape=[batch_size], dtype=np.int32)
       x = np.random.randn(batch_size, 3, IMG_SIZE, IMG_SIZE).astype(np.float32)
       y = np.random.randint(0, num_classes, batch_size, dtype=np.int32)
       tx.copy_from_numpy(x)
       ty.copy_from_numpy(y)
       out = model(tx)
       loss = autograd.softmax_cross_entropy(out, ty)               
       for p, g in autograd.backward(loss):
           sychronize(p, sgd)
   
       for epoch in range(max_epoch):
           np.random.shuffle(idx)
   
           #Training Phase
           autograd.training = True
           train_correct = np.zeros(shape=[1],dtype=np.float32)
           test_correct = np.zeros(shape=[1],dtype=np.float32)
           train_loss = np.zeros(shape=[1],dtype=np.float32)
           with trange(num_train_batch) as t:
               t.set_description('Epoch={}'.format(epoch))
               for b in t:
                   x = train_x[idx[b * batch_size: (b + 1) * batch_size]]
                   x = augmentation(x, batch_size)
                   x = resize_dataset(x,IMG_SIZE)
                   y = train_y[idx[b * batch_size: (b + 1) * batch_size]]
                   tx.copy_from_numpy(x)
                   ty.copy_from_numpy(y)
                   out = model(tx)
                   loss = autograd.softmax_cross_entropy(out, ty)               
                   train_correct += accuracy(tensor.to_numpy(out), 
to_categorical(y, num_classes)).astype(np.float32)
                   train_loss += tensor.to_numpy(loss)[0]
                   for p, g in autograd.backward(loss):
                       sgd.update(p, g)
   
           #print("rank"+str(sgd.rank_in_global)+": Acc="+str(train_correct)+". 
Loss="+str(train_loss), flush=True)
   
           #print("world size="+str(sgd.world_size), flush=True)
   
           #reduce all the accuracy and loss from different rank
           reducer.copy_from_numpy(train_correct)
           reducer=sgd.all_reduce(reducer)
           train_correct = tensor.to_numpy(reducer) 
   
           reducer.copy_from_numpy(train_loss)
           reducer=sgd.all_reduce(reducer)
           train_loss = tensor.to_numpy(reducer) * sgd.world_size
   
           #if(sgd.rank_in_global==0):
           #    print('Training loss = %f, Acc count = %f' % (train_loss, 
train_correct), flush=True)
   
           if(sgd.rank_in_global==0):
               print('Training loss = %f, training accuracy = %f' % 
(train_loss, train_correct / (num_train_batch*batch_size)), flush=True)
   
   
           #Evaulation Phase
           autograd.training = False
           for b in range(num_test_batch):
               x = test_x[b * batch_size: (b + 1) * batch_size]
               x = resize_dataset(x,IMG_SIZE)
               y = test_y[b * batch_size: (b + 1) * batch_size]
               tx.copy_from_numpy(x)
               ty.copy_from_numpy(y)
               out_test = model(tx)
               test_correct += accuracy(tensor.to_numpy(out_test), 
to_categorical(y, num_classes))
   
           reducer.copy_from_numpy(test_correct)
           reducer=sgd.all_reduce(reducer)
           test_correct = tensor.to_numpy(reducer) 
   
           if(sgd.rank_in_global==0):
               print('Test accuracy = %f' % (test_correct / 
(num_test_batch*(batch_size))), flush=True)
   
   ```
   
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to