IssacCheng opened a new issue #15738: AMP results in higher loss
URL: https://github.com/apache/incubator-mxnet/issues/15738
 
 
   After AMP training for this sample GluonNLP training model:
   https://github.com/dmlc/gluon-nlp/compare/master...ChengXianbing:master
   we found that the loss is higher than training without enabling AMP
   (We also enable AMP training in our own language model, we found that not 
only the loss is
   higher, but also the training throughput decreases from 30445.52 samples/s 
to 25603.86 samples/s).
   
   Could you please help out?
   
   ```
   $ python word_language_model.py --model gru --emsize 64 --nhid 128 --lr 1.0 
--epochs 3 --bptt 10 --tied --nlayers 1 --test_mode --gpu 0
   [Epoch 0] throughput 25758.38 samples/s
   [Epoch 0] time cost 0.44s, valid loss 10.12, valid ppl 24864.71,lr 1.00
   [Epoch 0] test loss 10.15, test ppl 25633.89
   [Epoch 1] throughput 71637.50 samples/s
   [Epoch 1] time cost 0.36s, valid loss 9.06, valid ppl 8631.06,lr 1.00
   [Epoch 1] test loss 9.21, test ppl 10037.64
   [Epoch 2] throughput 66607.71 samples/s
   [Epoch 2] time cost 0.30s, valid loss 8.27, valid ppl 3919.10,lr 1.00
   [Epoch 2] test loss 8.45, test ppl 4673.45
   Total training throughput 14113.57 samples/s
   Best validation loss 8.27, val ppl 3919.10
   Best test loss 8.45, test ppl 4673.45
   Total time cost 1.82s
   
   $ python word_language_model.py --model gru --emsize 64 --nhid 128 --lr 1.0 
--epochs 3 --bptt 10 --tied --nlayers 1 --test_mode --amp_training --gpu 0
   [Epoch 0] throughput 36236.71 samples/s
   [Epoch 0] time cost 0.44s, valid loss 10.41, valid ppl 33323.06,lr 1.00
   [Epoch 0] test loss 10.41, test ppl 33251.08
   [Epoch 1] throughput 64450.66 samples/s
   [Epoch 1] time cost 0.32s, valid loss 10.41, valid ppl 33322.87,lr 1.00
   [Epoch 1] test loss 10.41, test ppl 33250.94
   [Epoch 2] throughput 63184.95 samples/s
   [Epoch 2] time cost 0.33s, valid loss 10.41, valid ppl 33322.68,lr 1.00
   [Epoch 2] test loss 10.41, test ppl 33250.76
   Total training throughput 14169.86 samples/s
   Best validation loss 10.41, val ppl 33322.68
   Best test loss 10.41, test ppl 33250.76
   Total time cost 1.82s
   ```
   
   Here is training log snippet when training our own lm.
   ```
   $ python train.py .... --amp_training --gpu 0
   [Epoch 0]: ?| [3442/?, loss=6.95, ppl=1040.65]
   [Epoch 0]: throughput 41170.57 samples/s
   [Epoch 0]: time cost 2.82s, valid loss 6.91, valid ppl 999.74
   [Epoch 0]: test loss 6.91, test ppl 999.82
   [Epoch 1]: ?| [3439/?, loss=6.94, ppl=1035.88]
   [Epoch 1]: throughput 42611.66 samples/s
   [Epoch 1]: time cost 2.74s, valid loss 6.90, valid ppl 995.40
   [Epoch 1]: test loss 6.90, test ppl 995.63
   [Epoch 2]: ?| [3443/?, loss=6.94, ppl=1031.34]
   [Epoch 2]: throughput 43192.13 samples/s
   [Epoch 2]: time cost 2.69s, valid loss 6.90, valid ppl 991.09
   [Epoch 2]: test loss 6.90, test ppl 991.45
   Total training throughput 25603.86 samples/s
   Best test loss 6.90, test ppl 991.45
   
   $ python train.py .... --gpu 0
   [Epoch 0]: ?| [3442/?, loss=5.22, ppl=184.47]
   [Epoch 0]: throughput 44378.70 samples/s
   [Epoch 0]: time cost 2.59s, valid loss 4.39, valid ppl 80.54
   [Epoch 0]: test loss 4.51, test ppl 91.15
   [Epoch 1]: ?| [3439/?, loss=4.49, ppl=89.51]
   [Epoch 1]: throughput 52796.93 samples/s
   [Epoch 1]: time cost 2.19s, valid loss 4.15, valid ppl 63.30
   [Epoch 1]: test loss 4.29, test ppl 72.85
   [Epoch 2]: ?| [3443/?, loss=4.28, ppl=72.37]
   [Epoch 2]: throughput 50600.25 samples/s
   [Epoch 2]: time cost 2.29s, valid loss 3.99, valid ppl 54.14,lr 1
   [Epoch 2]: test loss 4.14, test ppl 62.88
   Total training throughput 30445.52 samples/s
   Best test loss 4.14, test ppl 62.88
   ```
   
   ```
   $ nvidia-smi -q
   ==============NVSMI LOG==============
   
   Timestamp                           : Fri Aug  2 18:38:22 2019
   Driver Version                      : 418.67
   CUDA Version                        : 10.1
   
   Attached GPUs                       : 1
   GPU 00000000:00:1E.0
       Product Name                    : Tesla V100-SXM2-16GB
       Product Brand                   : Tesla
       Display Mode                    : Enabled
       Display Active                  : Disabled
       Persistence Mode                : Disabled
       Accounting Mode                 : Disabled
       Accounting Mode Buffer Size     : 4000
       Driver Model
           Current                     : N/A
           Pending                     : N/A
       Serial Number                   : 0322917091773
       GPU UUID                        : 
GPU-e9a16ab2-2c86-8a0b-1126-8511d8165cd5
       Minor Number                    : 0
       VBIOS Version                   : 88.00.4F.00.09
       MultiGPU Board                  : No
       Board ID                        : 0x1e
       GPU Part Number                 : 900-2G503-0000-000
       Inforom Version
           Image Version               : G503.0201.00.03
           OEM Object                  : 1.1
           ECC Object                  : 5.0
           Power Management Object     : N/A
       GPU Operation Mode
           Current                     : N/A
           Pending                     : N/A
       GPU Virtualization Mode
           Virtualization mode         : Pass-Through
       IBMNPU
           Relaxed Ordering Mode       : N/A
       PCI
           Bus                         : 0x00
           Device                      : 0x1E
           Domain                      : 0x0000
           Device Id                   : 0x1DB110DE
           Bus Id                      : 00000000:00:1E.0
           Sub System Id               : 0x121210DE
           GPU Link Info
               PCIe Generation
                   Max                 : 3
                   Current             : 3
               Link Width
                   Max                 : 16x
                   Current             : 16x
           Bridge Chip
               Type                    : N/A
               Firmware                : N/A
           Replays Since Reset         : 0
           Replay Number Rollovers     : 0
           Tx Throughput               : 0 KB/s
           Rx Throughput               : 0 KB/s
       Fan Speed                       : N/A
       Performance State               : P0
       Clocks Throttle Reasons
           Idle                        : Not Active
           Applications Clocks Setting : Not Active
           SW Power Cap                : Not Active
           HW Slowdown                 : Not Active
               HW Thermal Slowdown     : Not Active
               HW Power Brake Slowdown : Not Active
           Sync Boost                  : Not Active
           SW Thermal Slowdown         : Not Active
           Display Clock Setting       : Not Active
       FB Memory Usage
           Total                       : 16130 MiB
           Used                        : 0 MiB
           Free                        : 16130 MiB
       BAR1 Memory Usage
           Total                       : 16384 MiB
           Used                        : 2 MiB
           Free                        : 16382 MiB
       Compute Mode                    : Default
       Utilization
           Gpu                         : 4 %
           Memory                      : 0 %
           Encoder                     : 0 %
           Decoder                     : 0 %
       Encoder Stats
           Active Sessions             : 0
           Average FPS                 : 0
           Average Latency             : 0
       FBC Stats
           Active Sessions             : 0
           Average FPS                 : 0
           Average Latency             : 0
       Ecc Mode
           Current                     : Enabled
           Pending                     : Enabled
       ECC Errors
           Volatile
               Single Bit
                   Device Memory       : 0
                   Register File       : 0
                   L1 Cache            : 0
                   L2 Cache            : 0
                   Texture Memory      : N/A
                   Texture Shared      : N/A
                   CBU                 : N/A
                   Total               : 0
               Double Bit
                   Device Memory       : 0
                   Register File       : 0
                   L1 Cache            : 0
                   L2 Cache            : 0
                   Texture Memory      : N/A
                   Texture Shared      : N/A
                   CBU                 : 0
                   Total               : 0
           Aggregate
               Single Bit
                   Device Memory       : 5
                   Register File       : 0
                   L1 Cache            : 0
                   L2 Cache            : 0
                   Texture Memory      : N/A
                   Texture Shared      : N/A
                   CBU                 : N/A
                   Total               : 5
               Double Bit
                   Device Memory       : 0
                   Register File       : 0
                   L1 Cache            : 0
                   L2 Cache            : 0
                   Texture Memory      : N/A
                   Texture Shared      : N/A
                   CBU                 : 0
                   Total               : 0
       Retired Pages
           Single Bit ECC              : 1
           Double Bit ECC              : 0
           Pending                     : No
       Temperature
           GPU Current Temp            : 50 C
           GPU Shutdown Temp           : 90 C
           GPU Slowdown Temp           : 87 C
           GPU Max Operating Temp      : 83 C
           Memory Current Temp         : 45 C
           Memory Max Operating Temp   : 85 C
       Power Readings
           Power Management            : Supported
           Power Draw                  : 42.75 W
           Power Limit                 : 300.00 W
           Default Power Limit         : 300.00 W
           Enforced Power Limit        : 300.00 W
           Min Power Limit             : 150.00 W
           Max Power Limit             : 300.00 W
       Clocks
           Graphics                    : 1312 MHz
           SM                          : 1312 MHz
           Memory                      : 877 MHz
           Video                       : 1177 MHz
       Applications Clocks
           Graphics                    : 1312 MHz
           Memory                      : 877 MHz
       Default Applications Clocks
           Graphics                    : 1312 MHz
           Memory                      : 877 MHz
       Max Clocks
           Graphics                    : 1530 MHz
           SM                          : 1530 MHz
           Memory                      : 877 MHz
           Video                       : 1372 MHz
       Max Customer Boost Clocks
           Graphics                    : 1530 MHz
       Clock Policy
           Auto Boost                  : N/A
           Auto Boost Default          : N/A
       Processes                       : None
   ```
   mxnet version: ```mxnet-cu101        1.5.0b20190711```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to