[GitHub] ThomasDelteil commented on issue #11282: Surprisngly low Traning performance on V100

GitBox Thu, 05 Jul 2018 12:28:49 -0700

ThomasDelteil commented on issue #11282: Surprisngly low Traning performance on 
V100
URL: 
https://github.com/apache/incubator-mxnet/issues/11282#issuecomment-402828395
 
 
   @ZaidQureshi it looks like you used `--model` instead of `--network`
   
   ```
   python3 train_imagenet.py --gpus 2 --network alexnet --test-io 0 
--data-nthreads 40 --benchmark 1 --batch-size 64
   INFO:root:start with arguments Namespace(batch_size=64, benchmark=1, 
data_nthreads=40, data_train=None, data_train_idx='', data_val=None, 
data_val_idx='', disp_batches=20, dtype='float32', gc_threshold=0.5, 
gc_type='none', gpus='2', image_shape='3,224,224', initializer='default', 
kv_store='device', load_epoch=None, loss='', lr=0.1, lr_factor=0.1, 
lr_step_epochs='30,60', macrobatch_size=0, max_random_aspect_ratio=0.25, 
max_random_h=36, max_random_l=50, max_random_rotate_angle=10, max_random_s=50, 
max_random_scale=1, max_random_shear_ratio=0.1, min_random_scale=1, 
model_prefix=None, mom=0.9, monitor=0, network='alexnet', num_classes=1000, 
num_epochs=80, num_examples=1281167, num_layers=50, optimizer='sgd', 
pad_size=0, random_crop=1, random_mirror=1, rgb_mean='123.68,116.779,103.939', 
save_period=1, test_io=0, top_k=0, warmup_epochs=5, warmup_strategy='linear', 
wd=0.0001)
   [19:27:05] src/operator/nn/./cudnn/./cudnn_algoreg-inl.h:107: Running 
performance tests to find the best convolution algorithm, this can take a 
while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
   INFO:root:Epoch[0] Batch [20]        Speed: 2599.49 samples/sec      
accuracy=0.028274
   INFO:root:Epoch[0] Batch [40]        Speed: 2634.29 samples/sec      
accuracy=0.023438
   INFO:root:Epoch[0] Batch [60]        Speed: 2627.29 samples/sec      
accuracy=0.015625
   INFO:root:Epoch[0] Batch [80]        Speed: 2630.80 samples/sec      
accuracy=0.020313
   INFO:root:Epoch[0] Batch [100]       Speed: 2626.39 samples/sec      
accuracy=0.021094
   INFO:root:Epoch[0] Batch [120]       Speed: 2625.73 samples/sec      
accuracy=0.018750
   INFO:root:Epoch[0] Batch [140]       Speed: 2627.31 samples/sec      
accuracy=0.017188
   INFO:root:Epoch[0] Batch [160]       Speed: 2625.44 samples/sec      
accuracy=0.017969
   INFO:root:Epoch[0] Batch [180]       Speed: 2629.67 samples/sec      
accuracy=0.018750
   INFO:root:Epoch[0] Batch [200]       Speed: 2624.55 samples/sec      
accuracy=0.020313
   INFO:root:Epoch[0] Batch [220]       Speed: 2624.60 samples/sec      
accuracy=0.018750
   ```
   
   @indhub can you please close this issue? Thanks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] ThomasDelteil commented on issue #11282: Surprisngly low Traning performance on V100

Reply via email to