[GitHub] Wallart edited a comment on issue #13037: ImageDetIter looping forever in MXNet-1.3.0

GitBox Thu, 15 Nov 2018 13:45:02 -0800

Wallart edited a comment on issue #13037: ImageDetIter looping forever in 
MXNet-1.3.0
URL: 
https://github.com/apache/incubator-mxnet/issues/13037#issuecomment-439200178
 
 
   I'm running my code on nvidia-docker containers (Ubuntu 17.10) with CUDA 
9.2. I compiled each of my MXNet version from scratch with opencv and mkldnn 
support.
   On the hardware side I'm using 2 GTX 1080Ti.
   
   If we consider the following code snippet, using a dataset in ImageRecord 
format (approximately 200 images) :
    
   ```
       cls_loss = FocalLoss()
       box_loss = SmoothL1Loss()
       cls_metric = mx.metric.Accuracy()
       box_metric = mx.metric.MAE()
   
       for epoch in range(start_epoch, epochs):
           # reset iterator and tick
           train_data.reset()
           cls_metric.reset()
           box_metric.reset()
           epoch_tick = time.time()
   
           # iterate through all batch
           for i, batch in enumerate(train_data):
               batch_tick = time.time()
   
               # record gradients
               with autograd.record():
                   x = batch.data[0].as_in_context(ctx)
                   y = batch.label[0].as_in_context(ctx)
   
                   default_anchors, class_predictions, box_predictions = net(x)
                   box_target, box_mask, cls_target = 
training_targets(default_anchors, class_predictions, y)
   
                   # losses
                   loss1 = cls_loss(class_predictions, cls_target)
                   loss2 = box_loss(box_predictions, box_target, box_mask)
   
                   # sum all losses
                   loss = loss1 + loss2
   
                   # backpropagate
                   loss.backward()
   
               # apply
               trainer.step(batch_size)
   
               # update metrics
               cls_metric.update([cls_target], [nd.transpose(class_predictions, 
(0, 2, 1))])
               box_metric.update([box_target], [box_predictions * box_mask])
   
               if (i + 1) % log_interval == 0:
                   name1, val1 = cls_metric.get()
                   name2, val2 = box_metric.get()
                   print('[Epoch %d Batch %d] speed: %f samples/s, training: 
%s=%f, %s=%f'
                         % (epoch, i, batch_size / (time.time() - batch_tick), 
name1, val1, name2, val2))
   
           # end of epoch logging
           name1, val1 = cls_metric.get()
           name2, val2 = box_metric.get()
           print('[Epoch %d] training: %s=%f, %s=%f' % (epoch, name1, val1, 
name2, val2))
           print('[Epoch %d] time cost: %f' % (epoch, time.time() - epoch_tick))
   ```
   
   On MXNet 1.2.1 it will work as expected and the epochs will keeps flowing 
through the console
   
   > [Epoch 0] training: accuracy=0.833192, mae=0.004929
   > [Epoch 0] time cost: 1.240091
   > [Epoch 1] training: accuracy=0.966545, mae=0.004379
   > [Epoch 1] time cost: 0.610014
   > [Epoch 2] training: accuracy=0.976884, mae=0.003983
   > [Epoch 2] time cost: 0.631764
   > [Epoch 3] training: accuracy=0.983173, mae=0.004638
   
   But on MXNet 1.3.0 an epoch will be divided to an infinite range of batches
   
   > [Epoch 0 Batch 19] speed: 1155.356185 samples/s, training: 
accuracy=0.923830, mae=0.004783
   > [Epoch 0 Batch 39] speed: 1105.710115 samples/s, training: 
accuracy=0.954663, mae=0.004561
   > [Epoch 0 Batch 59] speed: 1169.286568 samples/s, training: 
accuracy=0.966536, mae=0.004413
   > [Epoch 0 Batch 79] speed: 1132.142250 samples/s, training: 
accuracy=0.973061, mae=0.004393
   > [Epoch 0 Batch 99] speed: 1115.432219 samples/s, training: 
accuracy=0.977253, mae=0.004304
   > [Epoch 0 Batch 119] speed: 1139.079420 samples/s, training: 
accuracy=0.980220, mae=0.004205


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] Wallart edited a comment on issue #13037: ImageDetIter looping forever in MXNet-1.3.0

Reply via email to