ZaidQureshi opened a new issue #11284: Issues with Gluon example/gluon/image_classification.py example URL: https://github.com/apache/incubator-mxnet/issues/11284 ## Description When trying to train a network (Alexnet and Resnet50) using --dtype float16 I get an error at some point in the training that data type of float32 was expected but got float16. Also, there doesn't seem to be a way to train using synthetic data with this example, (ie. the script doesn't accept --benchmark 1) although the Read for the example/image_classification claims its possible. ## Environment info (Required) ``` ----------Python Info---------- Version : 3.5.2 Compiler : GCC 5.4.0 20160609 Build : ('default', 'Nov 23 2017 16:37:01') Arch : ('64bit', 'ELF') ------------Pip Info----------- Version : 10.0.1 Directory : /usr/local/lib/python3.5/dist-packages/pip ----------MXNet Info----------- Version : 1.3.0 Directory : /usr/local/lib/python3.5/dist-packages/mxnet Commit Hash : b434b8ec18f774c99b0830bd3ca66859212b4911 ----------System Info---------- Platform : Linux-4.13.0-45-generic-x86_64-with-Ubuntu-16.04-xenial system : Linux node : css-host-8 release : 4.13.0-45-generic version : #50~16.04.1-Ubuntu SMP Wed May 30 11:18:27 UTC 2018 ----------Hardware Info---------- machine : x86_64 processor : x86_64 Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 40 On-line CPU(s) list: 0-39 Thread(s) per core: 2 Core(s) per socket: 10 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 79 Model name: Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz Stepping: 1 CPU MHz: 1200.189 CPU max MHz: 3400.0000 CPU min MHz: 1200.0000 BogoMIPS: 4799.72 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 25600K NUMA node0 CPU(s): 0-9,20-29 NUMA node1 CPU(s): 10-19,30-39 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti retpoline intel_ppin intel_pt spec_ctrl tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts ----------Network Test---------- Setting timeout: 10 Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0255 sec, LOAD: 0.1334 sec. Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0100 sec, LOAD: 0.5690 sec. Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.3449 sec, LOAD: 1.6452 sec. Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0099 sec, LOAD: 0.3464 sec. Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.2326 sec, LOAD: 0.6122 sec. Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.3419 sec, LOAD: 2.3122 sec. ``` Package used (Python/R/Scala/Julia): I'm using Python3 package mxnet-cu91. ## Build info (Required if built from source) Compiler (gcc/clang/mingw/visual studio): gcc MXNet commit hash: b434b8ec18f774c99b0830bd3ca66859212b4911 Build config: I am using the python3 mxnet-cu91 package. ## Error Message: Resnet50: ``` INFO:root:Starting new image-classification task:, Namespace(batch_norm=False, batch_size=128, builtin_profiler=0, data_dir='', dataset='dummy', dtype='float16', epochs=10, gpus='2', kvstore='device', log_interval=1, lr=0.1, lr_factor=0.1, lr_steps='30,60,90', mode='imperative', model='resnet50_v1', momentum=0.9, num_workers=4, prefix='', profile=False, resume='', save_frequency=10, seed=123, start_epoch=0, use_pretrained=False, use_thumbnail=False, wd=0.0001) [11:23:39] src/operator/nn/./cudnn/./cudnn_algoreg-inl.h:107: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable) INFO:root:Epoch[0] Batch [0] Speed: 38.761533 samples/sec accuracy=1.000000, top_k_accuracy_5=0.000000 INFO:root:Epoch[0] Batch [1] Speed: 430.549337 samples/sec accuracy=1.000000, top_k_accuracy_5=0.500000 INFO:root:Epoch[0] Batch [2] Speed: 629.449014 samples/sec accuracy=1.000000, top_k_accuracy_5=0.666667 INFO:root:Epoch[0] Batch [3] Speed: 632.102159 samples/sec accuracy=1.000000, top_k_accuracy_5=0.750000 INFO:root:Epoch[0] Batch [4] Speed: 636.633253 samples/sec accuracy=1.000000, top_k_accuracy_5=0.800000 INFO:root:Epoch[0] Batch [5] Speed: 644.789297 samples/sec accuracy=1.000000, top_k_accuracy_5=0.833333 INFO:root:Epoch[0] Batch [6] Speed: 632.178824 samples/sec accuracy=1.000000, top_k_accuracy_5=0.857143 INFO:root:Epoch[0] Batch [7] Speed: 634.513685 samples/sec accuracy=1.000000, top_k_accuracy_5=0.875000 INFO:root:Epoch[0] Batch [8] Speed: 638.223330 samples/sec accuracy=1.000000, top_k_accuracy_5=0.888889 INFO:root:Epoch[0] Batch [9] Speed: 642.008849 samples/sec accuracy=1.000000, top_k_accuracy_5=0.900000 INFO:root:Epoch[0] Batch [10] Speed: 642.620550 samples/sec accuracy=1.000000, top_k_accuracy_5=0.909091 INFO:root:Epoch[0] Batch [11] Speed: 635.800896 samples/sec accuracy=1.000000, top_k_accuracy_5=0.916667 INFO:root:Epoch[0] Batch [12] Speed: 641.673524 samples/sec accuracy=1.000000, top_k_accuracy_5=0.923077 INFO:root:Epoch[0] Batch [13] Speed: 628.907794 samples/sec accuracy=1.000000, top_k_accuracy_5=0.928571 INFO:root:Epoch[0] Batch [14] Speed: 631.297189 samples/sec accuracy=1.000000, top_k_accuracy_5=0.933333 INFO:root:Epoch[0] Batch [15] Speed: 635.057088 samples/sec accuracy=1.000000, top_k_accuracy_5=0.937500 INFO:root:Epoch[0] Batch [16] Speed: 629.546444 samples/sec accuracy=1.000000, top_k_accuracy_5=0.941176 INFO:root:Epoch[0] Batch [17] Speed: 633.887375 samples/sec accuracy=1.000000, top_k_accuracy_5=0.944444 INFO:root:Epoch[0] Batch [18] Speed: 637.994282 samples/sec accuracy=1.000000, top_k_accuracy_5=0.947368 INFO:root:Epoch[0] Batch [19] Speed: 625.316271 samples/sec accuracy=1.000000, top_k_accuracy_5=0.950000 INFO:root:Epoch[0] Batch [20] Speed: 633.214498 samples/sec accuracy=1.000000, top_k_accuracy_5=0.952381 INFO:root:Epoch[0] Batch [21] Speed: 632.507277 samples/sec accuracy=1.000000, top_k_accuracy_5=0.954545 INFO:root:Epoch[0] Batch [22] Speed: 637.411028 samples/sec accuracy=1.000000, top_k_accuracy_5=0.956522 INFO:root:Epoch[0] Batch [23] Speed: 634.509185 samples/sec accuracy=1.000000, top_k_accuracy_5=0.958333 INFO:root:Epoch[0] Batch [24] Speed: 630.860258 samples/sec accuracy=1.000000, top_k_accuracy_5=0.960000 INFO:root:Epoch[0] Batch [25] Speed: 632.370939 samples/sec accuracy=1.000000, top_k_accuracy_5=0.961538 INFO:root:Epoch[0] Batch [26] Speed: 631.902770 samples/sec accuracy=1.000000, top_k_accuracy_5=0.962963 INFO:root:Epoch[0] Batch [27] Speed: 628.879800 samples/sec accuracy=1.000000, top_k_accuracy_5=0.964286 INFO:root:Epoch[0] Batch [28] Speed: 637.116021 samples/sec accuracy=1.000000, top_k_accuracy_5=0.965517 INFO:root:Epoch[0] Batch [29] Speed: 623.945647 samples/sec accuracy=1.000000, top_k_accuracy_5=0.966667 INFO:root:Epoch[0] Batch [30] Speed: 614.313059 samples/sec accuracy=1.000000, top_k_accuracy_5=0.967742 INFO:root:Epoch[0] Batch [31] Speed: 598.348864 samples/sec accuracy=1.000000, top_k_accuracy_5=0.968750 INFO:root:Epoch[0] Batch [32] Speed: 630.654243 samples/sec accuracy=1.000000, top_k_accuracy_5=0.969697 INFO:root:Epoch[0] Batch [33] Speed: 626.203179 samples/sec accuracy=1.000000, top_k_accuracy_5=0.970588 INFO:root:Epoch[0] Batch [34] Speed: 621.371898 samples/sec accuracy=1.000000, top_k_accuracy_5=0.971429 INFO:root:Epoch[0] Batch [35] Speed: 625.962240 samples/sec accuracy=1.000000, top_k_accuracy_5=0.972222 INFO:root:Epoch[0] Batch [36] Speed: 638.778429 samples/sec accuracy=1.000000, top_k_accuracy_5=0.972973 INFO:root:Epoch[0] Batch [37] Speed: 638.853680 samples/sec accuracy=1.000000, top_k_accuracy_5=0.973684 INFO:root:Epoch[0] Batch [38] Speed: 635.063098 samples/sec accuracy=1.000000, top_k_accuracy_5=0.974359 INFO:root:Epoch[0] Batch [39] Speed: 641.104962 samples/sec accuracy=1.000000, top_k_accuracy_5=0.975000 INFO:root:Epoch[0] Batch [40] Speed: 628.048478 samples/sec accuracy=1.000000, top_k_accuracy_5=0.975610 INFO:root:Epoch[0] Batch [41] Speed: 624.718154 samples/sec accuracy=1.000000, top_k_accuracy_5=0.976190 INFO:root:Epoch[0] Batch [42] Speed: 630.666097 samples/sec accuracy=1.000000, top_k_accuracy_5=0.976744 INFO:root:Epoch[0] Batch [43] Speed: 629.468940 samples/sec accuracy=1.000000, top_k_accuracy_5=0.977273 INFO:root:Epoch[0] Batch [44] Speed: 634.288790 samples/sec accuracy=1.000000, top_k_accuracy_5=0.977778 INFO:root:Epoch[0] Batch [45] Speed: 620.335145 samples/sec accuracy=1.000000, top_k_accuracy_5=0.978261 INFO:root:Epoch[0] Batch [46] Speed: 630.456506 samples/sec accuracy=1.000000, top_k_accuracy_5=0.978723 INFO:root:Epoch[0] Batch [47] Speed: 631.097565 samples/sec accuracy=1.000000, top_k_accuracy_5=0.979167 INFO:root:Epoch[0] Batch [48] Speed: 625.456142 samples/sec accuracy=1.000000, top_k_accuracy_5=0.979592 INFO:root:Epoch[0] Batch [49] Speed: 597.499816 samples/sec accuracy=1.000000, top_k_accuracy_5=0.980000 INFO:root:Epoch[0] Batch [50] Speed: 631.840300 samples/sec accuracy=1.000000, top_k_accuracy_5=0.980392 INFO:root:Epoch[0] Batch [51] Speed: 635.177303 samples/sec accuracy=1.000000, top_k_accuracy_5=0.980769 INFO:root:Epoch[0] Batch [52] Speed: 629.745088 samples/sec accuracy=1.000000, top_k_accuracy_5=0.981132 INFO:root:Epoch[0] Batch [53] Speed: 633.966719 samples/sec accuracy=1.000000, top_k_accuracy_5=0.981481 INFO:root:Epoch[0] Batch [54] Speed: 633.290685 samples/sec accuracy=1.000000, top_k_accuracy_5=0.981818 INFO:root:Epoch[0] Batch [55] Speed: 629.980078 samples/sec accuracy=1.000000, top_k_accuracy_5=0.982143 INFO:root:Epoch[0] Batch [56] Speed: 627.281642 samples/sec accuracy=1.000000, top_k_accuracy_5=0.982456 INFO:root:Epoch[0] Batch [57] Speed: 636.660431 samples/sec accuracy=1.000000, top_k_accuracy_5=0.982759 INFO:root:Epoch[0] Batch [58] Speed: 638.425210 samples/sec accuracy=1.000000, top_k_accuracy_5=0.983051 INFO:root:Epoch[0] Batch [59] Speed: 628.249118 samples/sec accuracy=1.000000, top_k_accuracy_5=0.983333 INFO:root:Epoch[0] Batch [60] Speed: 634.437953 samples/sec accuracy=1.000000, top_k_accuracy_5=0.983607 INFO:root:Epoch[0] Batch [61] Speed: 629.313991 samples/sec accuracy=1.000000, top_k_accuracy_5=0.983871 INFO:root:Epoch[0] Batch [62] Speed: 629.671966 samples/sec accuracy=1.000000, top_k_accuracy_5=0.984127 INFO:root:Epoch[0] Batch [63] Speed: 636.321618 samples/sec accuracy=1.000000, top_k_accuracy_5=0.984375 INFO:root:Epoch[0] Batch [64] Speed: 636.397046 samples/sec accuracy=1.000000, top_k_accuracy_5=0.984615 INFO:root:Epoch[0] Batch [65] Speed: 630.730557 samples/sec accuracy=1.000000, top_k_accuracy_5=0.984848 INFO:root:Epoch[0] Batch [66] Speed: 635.813696 samples/sec accuracy=1.000000, top_k_accuracy_5=0.985075 INFO:root:Epoch[0] Batch [67] Speed: 639.850346 samples/sec accuracy=1.000000, top_k_accuracy_5=0.985294 INFO:root:Epoch[0] Batch [68] Speed: 630.247795 samples/sec accuracy=1.000000, top_k_accuracy_5=0.985507 INFO:root:Epoch[0] Batch [69] Speed: 642.182405 samples/sec accuracy=1.000000, top_k_accuracy_5=0.985714 INFO:root:Epoch[0] Batch [70] Speed: 629.762078 samples/sec accuracy=1.000000, top_k_accuracy_5=0.985915 INFO:root:Epoch[0] Batch [71] Speed: 635.816708 samples/sec accuracy=1.000000, top_k_accuracy_5=0.986111 INFO:root:Epoch[0] Batch [72] Speed: 631.053798 samples/sec accuracy=1.000000, top_k_accuracy_5=0.986301 INFO:root:Epoch[0] Batch [73] Speed: 635.151002 samples/sec accuracy=1.000000, top_k_accuracy_5=0.986486 INFO:root:Epoch[0] Batch [74] Speed: 631.234839 samples/sec accuracy=1.000000, top_k_accuracy_5=0.986667 INFO:root:Epoch[0] Batch [75] Speed: 637.794947 samples/sec accuracy=1.000000, top_k_accuracy_5=0.986842 INFO:root:Epoch[0] Batch [76] Speed: 635.627762 samples/sec accuracy=1.000000, top_k_accuracy_5=0.987013 INFO:root:Epoch[0] Batch [77] Speed: 634.704971 samples/sec accuracy=1.000000, top_k_accuracy_5=0.987179 INFO:root:Epoch[0] Batch [78] Speed: 626.428223 samples/sec accuracy=1.000000, top_k_accuracy_5=0.987342 INFO:root:Epoch[0] Batch [79] Speed: 631.866328 samples/sec accuracy=1.000000, top_k_accuracy_5=0.987500 INFO:root:Epoch[0] Batch [80] Speed: 633.852949 samples/sec accuracy=1.000000, top_k_accuracy_5=0.987654 INFO:root:Epoch[0] Batch [81] Speed: 628.484464 samples/sec accuracy=1.000000, top_k_accuracy_5=0.987805 INFO:root:Epoch[0] Batch [82] Speed: 639.696342 samples/sec accuracy=1.000000, top_k_accuracy_5=0.987952 INFO:root:Epoch[0] Batch [83] Speed: 628.357943 samples/sec accuracy=1.000000, top_k_accuracy_5=0.988095 INFO:root:Epoch[0] Batch [84] Speed: 632.414144 samples/sec accuracy=1.000000, top_k_accuracy_5=0.988235 INFO:root:Epoch[0] Batch [85] Speed: 635.123201 samples/sec accuracy=1.000000, top_k_accuracy_5=0.988372 INFO:root:Epoch[0] Batch [86] Speed: 637.747216 samples/sec accuracy=1.000000, top_k_accuracy_5=0.988506 INFO:root:Epoch[0] Batch [87] Speed: 630.293670 samples/sec accuracy=1.000000, top_k_accuracy_5=0.988636 INFO:root:Epoch[0] Batch [88] Speed: 634.680210 samples/sec accuracy=1.000000, top_k_accuracy_5=0.988764 INFO:root:Epoch[0] Batch [89] Speed: 630.405426 samples/sec accuracy=1.000000, top_k_accuracy_5=0.988889 INFO:root:Epoch[0] Batch [90] Speed: 630.142012 samples/sec accuracy=1.000000, top_k_accuracy_5=0.989011 INFO:root:Epoch[0] Batch [91] Speed: 635.331395 samples/sec accuracy=1.000000, top_k_accuracy_5=0.989130 INFO:root:Epoch[0] Batch [92] Speed: 637.355788 samples/sec accuracy=1.000000, top_k_accuracy_5=0.989247 INFO:root:Epoch[0] Batch [93] Speed: 630.492786 samples/sec accuracy=1.000000, top_k_accuracy_5=0.989362 INFO:root:Epoch[0] Batch [94] Speed: 627.613096 samples/sec accuracy=1.000000, top_k_accuracy_5=0.989474 INFO:root:Epoch[0] Batch [95] Speed: 636.513241 samples/sec accuracy=1.000000, top_k_accuracy_5=0.989583 INFO:root:Epoch[0] Batch [96] Speed: 639.884664 samples/sec accuracy=1.000000, top_k_accuracy_5=0.989691 INFO:root:Epoch[0] Batch [97] Speed: 635.479544 samples/sec accuracy=1.000000, top_k_accuracy_5=0.989796 INFO:root:Epoch[0] Batch [98] Speed: 637.682071 samples/sec accuracy=1.000000, top_k_accuracy_5=0.989899 INFO:root:Epoch[0] Batch [99] Speed: 631.076052 samples/sec accuracy=1.000000, top_k_accuracy_5=0.990000 INFO:root:[Epoch 0] training: accuracy=1.000000, top_k_accuracy_5=0.990000 INFO:root:[Epoch 0] time cost: 23.480652 Traceback (most recent call last): File "../gluon/image_classification.py", line 290, in <module> main() File "../gluon/image_classification.py", line 274, in main train(opt, context) File "../gluon/image_classification.py", line 242, in train name, val_acc = test(ctx, val_data) File "../gluon/image_classification.py", line 166, in test outputs.append(net(x)) File "/usr/local/lib/python3.5/dist-packages/mxnet/gluon/block.py", line 481, in __call__ out = self.forward(*args) File "/usr/local/lib/python3.5/dist-packages/mxnet/gluon/block.py", line 821, in forward return self.hybrid_forward(ndarray, x, *args, **params) File "/usr/local/lib/python3.5/dist-packages/mxnet/gluon/model_zoo/vision/resnet.py", line 279, in hybrid_forward x = self.features(x) File "/usr/local/lib/python3.5/dist-packages/mxnet/gluon/block.py", line 481, in __call__ out = self.forward(*args) File "/usr/local/lib/python3.5/dist-packages/mxnet/gluon/block.py", line 821, in forward return self.hybrid_forward(ndarray, x, *args, **params) File "/usr/local/lib/python3.5/dist-packages/mxnet/gluon/nn/basic_layers.py", line 117, in hybrid_forward x = block(x) File "/usr/local/lib/python3.5/dist-packages/mxnet/gluon/block.py", line 481, in __call__ out = self.forward(*args) File "/usr/local/lib/python3.5/dist-packages/mxnet/gluon/block.py", line 821, in forward return self.hybrid_forward(ndarray, x, *args, **params) File "/usr/local/lib/python3.5/dist-packages/mxnet/gluon/nn/conv_layers.py", line 133, in hybrid_forward act = getattr(F, self._op_name)(x, weight, name='fwd', **self._kwargs) File "<string>", line 167, in Convolution File "/usr/local/lib/python3.5/dist-packages/mxnet/_ctypes/ndarray.py", line 92, in _imperative_invoke ctypes.byref(out_stypes))) File "/usr/local/lib/python3.5/dist-packages/mxnet/base.py", line 210, in check_call raise MXNetError(py_str(_LIB.MXGetLastError())) mxnet.base.MXNetError: [11:24:02] src/operator/nn/convolution.cc:283: Check failed: (*in_type)[i] == dtype (2 vs. 0) This layer requires uniform type. Expected 'float32' v.s. given 'float16' at 'weight' Stack trace returned 10 entries: [bt] (0) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(+0x34d0ea) [0x7fcc47e1a0ea] [bt] (1) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(+0x34d711) [0x7fcc47e1a711] [bt] (2) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(+0x59284c) [0x7fcc4805f84c] [bt] (3) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(+0x26d963f) [0x7fcc4a1a663f] [bt] (4) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(+0x26e2cad) [0x7fcc4a1afcad] [bt] (5) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(+0x25ffe99) [0x7fcc4a0cce99] [bt] (6) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(MXImperativeInvokeEx+0x6f) [0x7fcc4a0cd48f] [bt] (7) /usr/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so(ffi_call_unix64+0x4c) [0x7fccd70f7e20] [bt] (8) /usr/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so(ffi_call+0x2eb) [0x7fccd70f788b] [bt] (9) /usr/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so(_ctypes_callproc+0x49a) [0x7fccd70f201a] ``` Alexnet: ``` INFO:root:Starting new image-classification task:, Namespace(batch_norm=False, batch_size=128, builtin_profiler=0, data_dir='', dataset='dummy', dtype='float16', epochs=10, gpus='2', kvstore='device', log_interval=1, lr=0.1, lr_factor=0.1, lr_steps='30,60,90', mode='imperative', model='alexnet', momentum=0.9, num_workers=4, prefix='', profile=False, resume='', save_frequency=10, seed=123, start_epoch=0, use_pretrained=False, use_thumbnail=False, wd=0.0001) [11:25:47] src/operator/nn/./cudnn/./cudnn_algoreg-inl.h:107: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable) INFO:root:Epoch[0] Batch [0] Speed: 129.885558 samples/sec accuracy=1.000000, top_k_accuracy_5=0.000000 INFO:root:Epoch[0] Batch [1] Speed: 1461.012374 samples/sec accuracy=1.000000, top_k_accuracy_5=0.500000 INFO:root:Epoch[0] Batch [2] Speed: 1709.083278 samples/sec accuracy=1.000000, top_k_accuracy_5=0.666667 INFO:root:Epoch[0] Batch [3] Speed: 2039.070355 samples/sec accuracy=1.000000, top_k_accuracy_5=0.750000 INFO:root:Epoch[0] Batch [4] Speed: 1956.020534 samples/sec accuracy=1.000000, top_k_accuracy_5=0.800000 INFO:root:Epoch[0] Batch [5] Speed: 2144.635572 samples/sec accuracy=1.000000, top_k_accuracy_5=0.833333 INFO:root:Epoch[0] Batch [6] Speed: 2110.291864 samples/sec accuracy=1.000000, top_k_accuracy_5=0.857143 INFO:root:Epoch[0] Batch [7] Speed: 2191.077322 samples/sec accuracy=1.000000, top_k_accuracy_5=0.875000 INFO:root:Epoch[0] Batch [8] Speed: 2126.322487 samples/sec accuracy=1.000000, top_k_accuracy_5=0.888889 INFO:root:Epoch[0] Batch [9] Speed: 2168.509516 samples/sec accuracy=1.000000, top_k_accuracy_5=0.900000 INFO:root:Epoch[0] Batch [10] Speed: 2211.210742 samples/sec accuracy=1.000000, top_k_accuracy_5=0.909091 INFO:root:Epoch[0] Batch [11] Speed: 2181.843317 samples/sec accuracy=1.000000, top_k_accuracy_5=0.916667 INFO:root:Epoch[0] Batch [12] Speed: 1934.356274 samples/sec accuracy=1.000000, top_k_accuracy_5=0.923077 INFO:root:Epoch[0] Batch [13] Speed: 2177.674933 samples/sec accuracy=1.000000, top_k_accuracy_5=0.928571 INFO:root:Epoch[0] Batch [14] Speed: 2229.789643 samples/sec accuracy=1.000000, top_k_accuracy_5=0.933333 INFO:root:Epoch[0] Batch [15] Speed: 2218.034902 samples/sec accuracy=1.000000, top_k_accuracy_5=0.937500 INFO:root:Epoch[0] Batch [16] Speed: 2183.964593 samples/sec accuracy=1.000000, top_k_accuracy_5=0.941176 INFO:root:Epoch[0] Batch [17] Speed: 2190.022648 samples/sec accuracy=1.000000, top_k_accuracy_5=0.944444 INFO:root:Epoch[0] Batch [18] Speed: 2142.803765 samples/sec accuracy=1.000000, top_k_accuracy_5=0.947368 INFO:root:Epoch[0] Batch [19] Speed: 2128.269630 samples/sec accuracy=1.000000, top_k_accuracy_5=0.950000 INFO:root:Epoch[0] Batch [20] Speed: 2138.817161 samples/sec accuracy=1.000000, top_k_accuracy_5=0.952381 INFO:root:Epoch[0] Batch [21] Speed: 2122.069741 samples/sec accuracy=1.000000, top_k_accuracy_5=0.954545 INFO:root:Epoch[0] Batch [22] Speed: 2187.087427 samples/sec accuracy=1.000000, top_k_accuracy_5=0.956522 INFO:root:Epoch[0] Batch [23] Speed: 2153.997336 samples/sec accuracy=1.000000, top_k_accuracy_5=0.958333 INFO:root:Epoch[0] Batch [24] Speed: 2151.511277 samples/sec accuracy=1.000000, top_k_accuracy_5=0.960000 INFO:root:Epoch[0] Batch [25] Speed: 1927.757813 samples/sec accuracy=1.000000, top_k_accuracy_5=0.961538 INFO:root:Epoch[0] Batch [26] Speed: 2316.855017 samples/sec accuracy=1.000000, top_k_accuracy_5=0.962963 INFO:root:Epoch[0] Batch [27] Speed: 2260.004765 samples/sec accuracy=1.000000, top_k_accuracy_5=0.964286 INFO:root:Epoch[0] Batch [28] Speed: 2165.666585 samples/sec accuracy=1.000000, top_k_accuracy_5=0.965517 INFO:root:Epoch[0] Batch [29] Speed: 2325.495692 samples/sec accuracy=1.000000, top_k_accuracy_5=0.966667 INFO:root:Epoch[0] Batch [30] Speed: 2376.797025 samples/sec accuracy=1.000000, top_k_accuracy_5=0.967742 INFO:root:Epoch[0] Batch [31] Speed: 2369.652818 samples/sec accuracy=1.000000, top_k_accuracy_5=0.968750 INFO:root:Epoch[0] Batch [32] Speed: 2460.171438 samples/sec accuracy=1.000000, top_k_accuracy_5=0.969697 INFO:root:Epoch[0] Batch [33] Speed: 2185.253571 samples/sec accuracy=1.000000, top_k_accuracy_5=0.970588 INFO:root:Epoch[0] Batch [34] Speed: 2427.390954 samples/sec accuracy=1.000000, top_k_accuracy_5=0.971429 INFO:root:Epoch[0] Batch [35] Speed: 2398.479758 samples/sec accuracy=1.000000, top_k_accuracy_5=0.972222 INFO:root:Epoch[0] Batch [36] Speed: 2362.977769 samples/sec accuracy=1.000000, top_k_accuracy_5=0.972973 INFO:root:Epoch[0] Batch [37] Speed: 2300.701141 samples/sec accuracy=1.000000, top_k_accuracy_5=0.973684 INFO:root:Epoch[0] Batch [38] Speed: 2419.459984 samples/sec accuracy=1.000000, top_k_accuracy_5=0.974359 INFO:root:Epoch[0] Batch [39] Speed: 2052.376520 samples/sec accuracy=1.000000, top_k_accuracy_5=0.975000 INFO:root:Epoch[0] Batch [40] Speed: 2383.476415 samples/sec accuracy=1.000000, top_k_accuracy_5=0.975610 INFO:root:Epoch[0] Batch [41] Speed: 2334.881214 samples/sec accuracy=1.000000, top_k_accuracy_5=0.976190 INFO:root:Epoch[0] Batch [42] Speed: 2382.482158 samples/sec accuracy=1.000000, top_k_accuracy_5=0.976744 INFO:root:Epoch[0] Batch [43] Speed: 2269.846535 samples/sec accuracy=1.000000, top_k_accuracy_5=0.977273 INFO:root:Epoch[0] Batch [44] Speed: 2288.666934 samples/sec accuracy=1.000000, top_k_accuracy_5=0.977778 INFO:root:Epoch[0] Batch [45] Speed: 2362.738584 samples/sec accuracy=1.000000, top_k_accuracy_5=0.978261 INFO:root:Epoch[0] Batch [46] Speed: 2070.470430 samples/sec accuracy=1.000000, top_k_accuracy_5=0.978723 INFO:root:Epoch[0] Batch [47] Speed: 2272.854291 samples/sec accuracy=1.000000, top_k_accuracy_5=0.979167 INFO:root:Epoch[0] Batch [48] Speed: 2232.395025 samples/sec accuracy=1.000000, top_k_accuracy_5=0.979592 INFO:root:Epoch[0] Batch [49] Speed: 2197.246896 samples/sec accuracy=1.000000, top_k_accuracy_5=0.980000 INFO:root:Epoch[0] Batch [50] Speed: 2404.936959 samples/sec accuracy=1.000000, top_k_accuracy_5=0.980392 INFO:root:Epoch[0] Batch [51] Speed: 2411.732337 samples/sec accuracy=1.000000, top_k_accuracy_5=0.980769 INFO:root:Epoch[0] Batch [52] Speed: 2290.795835 samples/sec accuracy=1.000000, top_k_accuracy_5=0.981132 INFO:root:Epoch[0] Batch [53] Speed: 1863.391049 samples/sec accuracy=1.000000, top_k_accuracy_5=0.981481 INFO:root:Epoch[0] Batch [54] Speed: 1388.702278 samples/sec accuracy=1.000000, top_k_accuracy_5=0.981818 INFO:root:Epoch[0] Batch [55] Speed: 1917.601572 samples/sec accuracy=1.000000, top_k_accuracy_5=0.982143 INFO:root:Epoch[0] Batch [56] Speed: 1562.342599 samples/sec accuracy=1.000000, top_k_accuracy_5=0.982456 INFO:root:Epoch[0] Batch [57] Speed: 1610.213403 samples/sec accuracy=1.000000, top_k_accuracy_5=0.982759 INFO:root:Epoch[0] Batch [58] Speed: 1903.716551 samples/sec accuracy=1.000000, top_k_accuracy_5=0.983051 INFO:root:Epoch[0] Batch [59] Speed: 2190.898493 samples/sec accuracy=1.000000, top_k_accuracy_5=0.983333 INFO:root:Epoch[0] Batch [60] Speed: 1648.527213 samples/sec accuracy=1.000000, top_k_accuracy_5=0.983607 INFO:root:Epoch[0] Batch [61] Speed: 1567.492583 samples/sec accuracy=1.000000, top_k_accuracy_5=0.983871 INFO:root:Epoch[0] Batch [62] Speed: 1439.671858 samples/sec accuracy=1.000000, top_k_accuracy_5=0.984127 INFO:root:Epoch[0] Batch [63] Speed: 1838.045082 samples/sec accuracy=1.000000, top_k_accuracy_5=0.984375 INFO:root:Epoch[0] Batch [64] Speed: 1925.883759 samples/sec accuracy=1.000000, top_k_accuracy_5=0.984615 INFO:root:Epoch[0] Batch [65] Speed: 1418.372237 samples/sec accuracy=1.000000, top_k_accuracy_5=0.984848 INFO:root:Epoch[0] Batch [66] Speed: 1535.197685 samples/sec accuracy=1.000000, top_k_accuracy_5=0.985075 INFO:root:Epoch[0] Batch [67] Speed: 1737.688131 samples/sec accuracy=1.000000, top_k_accuracy_5=0.985294 INFO:root:Epoch[0] Batch [68] Speed: 1927.737047 samples/sec accuracy=1.000000, top_k_accuracy_5=0.985507 INFO:root:Epoch[0] Batch [69] Speed: 1889.632021 samples/sec accuracy=1.000000, top_k_accuracy_5=0.985714 INFO:root:Epoch[0] Batch [70] Speed: 1624.872618 samples/sec accuracy=1.000000, top_k_accuracy_5=0.985915 INFO:root:Epoch[0] Batch [71] Speed: 1791.229579 samples/sec accuracy=1.000000, top_k_accuracy_5=0.986111 INFO:root:Epoch[0] Batch [72] Speed: 2030.901763 samples/sec accuracy=1.000000, top_k_accuracy_5=0.986301 INFO:root:Epoch[0] Batch [73] Speed: 1657.909581 samples/sec accuracy=1.000000, top_k_accuracy_5=0.986486 INFO:root:Epoch[0] Batch [74] Speed: 1754.525975 samples/sec accuracy=1.000000, top_k_accuracy_5=0.986667 INFO:root:Epoch[0] Batch [75] Speed: 2114.081166 samples/sec accuracy=1.000000, top_k_accuracy_5=0.986842 INFO:root:Epoch[0] Batch [76] Speed: 2005.547071 samples/sec accuracy=1.000000, top_k_accuracy_5=0.987013 INFO:root:Epoch[0] Batch [77] Speed: 1223.288891 samples/sec accuracy=1.000000, top_k_accuracy_5=0.987179 INFO:root:Epoch[0] Batch [78] Speed: 1632.888602 samples/sec accuracy=1.000000, top_k_accuracy_5=0.987342 INFO:root:Epoch[0] Batch [79] Speed: 1857.749099 samples/sec accuracy=1.000000, top_k_accuracy_5=0.987500 INFO:root:Epoch[0] Batch [80] Speed: 1369.160001 samples/sec accuracy=1.000000, top_k_accuracy_5=0.987654 INFO:root:Epoch[0] Batch [81] Speed: 1581.092164 samples/sec accuracy=1.000000, top_k_accuracy_5=0.987805 INFO:root:Epoch[0] Batch [82] Speed: 1803.207970 samples/sec accuracy=1.000000, top_k_accuracy_5=0.987952 INFO:root:Epoch[0] Batch [83] Speed: 2025.232513 samples/sec accuracy=1.000000, top_k_accuracy_5=0.988095 INFO:root:Epoch[0] Batch [84] Speed: 2021.739536 samples/sec accuracy=1.000000, top_k_accuracy_5=0.988235 INFO:root:Epoch[0] Batch [85] Speed: 2089.666748 samples/sec accuracy=1.000000, top_k_accuracy_5=0.988372 INFO:root:Epoch[0] Batch [86] Speed: 1606.815830 samples/sec accuracy=1.000000, top_k_accuracy_5=0.988506 INFO:root:Epoch[0] Batch [87] Speed: 1784.661886 samples/sec accuracy=1.000000, top_k_accuracy_5=0.988636 INFO:root:Epoch[0] Batch [88] Speed: 1571.737384 samples/sec accuracy=1.000000, top_k_accuracy_5=0.988764 INFO:root:Epoch[0] Batch [89] Speed: 1813.305881 samples/sec accuracy=1.000000, top_k_accuracy_5=0.988889 INFO:root:Epoch[0] Batch [90] Speed: 2018.721515 samples/sec accuracy=1.000000, top_k_accuracy_5=0.989011 INFO:root:Epoch[0] Batch [91] Speed: 1937.973237 samples/sec accuracy=1.000000, top_k_accuracy_5=0.989130 INFO:root:Epoch[0] Batch [92] Speed: 2058.198976 samples/sec accuracy=1.000000, top_k_accuracy_5=0.989247 INFO:root:Epoch[0] Batch [93] Speed: 2084.692704 samples/sec accuracy=1.000000, top_k_accuracy_5=0.989362 INFO:root:Epoch[0] Batch [94] Speed: 1540.413033 samples/sec accuracy=1.000000, top_k_accuracy_5=0.989474 INFO:root:Epoch[0] Batch [95] Speed: 1467.987477 samples/sec accuracy=1.000000, top_k_accuracy_5=0.989583 INFO:root:Epoch[0] Batch [96] Speed: 1834.095430 samples/sec accuracy=1.000000, top_k_accuracy_5=0.989691 INFO:root:Epoch[0] Batch [97] Speed: 1582.350376 samples/sec accuracy=1.000000, top_k_accuracy_5=0.989796 INFO:root:Epoch[0] Batch [98] Speed: 1732.048380 samples/sec accuracy=1.000000, top_k_accuracy_5=0.989899 INFO:root:Epoch[0] Batch [99] Speed: 1598.191591 samples/sec accuracy=1.000000, top_k_accuracy_5=0.990000 INFO:root:[Epoch 0] training: accuracy=1.000000, top_k_accuracy_5=0.990000 INFO:root:[Epoch 0] time cost: 7.557455 Traceback (most recent call last): File "../gluon/image_classification.py", line 290, in <module> main() File "../gluon/image_classification.py", line 274, in main train(opt, context) File "../gluon/image_classification.py", line 242, in train name, val_acc = test(ctx, val_data) File "../gluon/image_classification.py", line 166, in test outputs.append(net(x)) File "/usr/local/lib/python3.5/dist-packages/mxnet/gluon/block.py", line 481, in __call__ out = self.forward(*args) File "/usr/local/lib/python3.5/dist-packages/mxnet/gluon/block.py", line 821, in forward return self.hybrid_forward(ndarray, x, *args, **params) File "/usr/local/lib/python3.5/dist-packages/mxnet/gluon/model_zoo/vision/alexnet.py", line 65, in hybrid_forward x = self.features(x) File "/usr/local/lib/python3.5/dist-packages/mxnet/gluon/block.py", line 481, in __call__ out = self.forward(*args) File "/usr/local/lib/python3.5/dist-packages/mxnet/gluon/block.py", line 821, in forward return self.hybrid_forward(ndarray, x, *args, **params) File "/usr/local/lib/python3.5/dist-packages/mxnet/gluon/nn/basic_layers.py", line 117, in hybrid_forward x = block(x) File "/usr/local/lib/python3.5/dist-packages/mxnet/gluon/block.py", line 481, in __call__ out = self.forward(*args) File "/usr/local/lib/python3.5/dist-packages/mxnet/gluon/block.py", line 821, in forward return self.hybrid_forward(ndarray, x, *args, **params) File "/usr/local/lib/python3.5/dist-packages/mxnet/gluon/nn/conv_layers.py", line 135, in hybrid_forward act = getattr(F, self._op_name)(x, weight, bias, name='fwd', **self._kwargs) File "<string>", line 167, in Convolution File "/usr/local/lib/python3.5/dist-packages/mxnet/_ctypes/ndarray.py", line 92, in _imperative_invoke ctypes.byref(out_stypes))) File "/usr/local/lib/python3.5/dist-packages/mxnet/base.py", line 210, in check_call raise MXNetError(py_str(_LIB.MXGetLastError())) mxnet.base.MXNetError: [11:25:54] src/operator/nn/convolution.cc:283: Check failed: (*in_type)[i] == dtype (2 vs. 0) This layer requires uniform type. Expected 'float32' v.s. given 'float16' at 'weight' Stack trace returned 10 entries: [bt] (0) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(+0x34d0ea) [0x7f28b86da0ea] [bt] (1) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(+0x34d711) [0x7f28b86da711] [bt] (2) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(+0x59284c) [0x7f28b891f84c] [bt] (3) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(+0x26d963f) [0x7f28baa6663f] [bt] (4) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(+0x26e2cad) [0x7f28baa6fcad] [bt] (5) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(+0x25ffe99) [0x7f28ba98ce99] [bt] (6) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(MXImperativeInvokeEx+0x6f) [0x7f28ba98d48f] [bt] (7) /usr/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so(ffi_call_unix64+0x4c) [0x7f29479b7e20] [bt] (8) /usr/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so(ffi_call+0x2eb) [0x7f29479b788b] [bt] (9) /usr/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so(_ctypes_callproc+0x49a) [0x7f29479b201a] ``` ## Minimum reproducible example (If you are using your own code, please provide a short script that reproduces the error. Otherwise, please provide link to the existing example.) ## Steps to reproduce (Paste the commands you ran that produced the error.) 1. python3 ../gluon/image_classification.py --dataset dummy --gpus 2 --epochs 10 --mode imperative --model resnet50_v2 --batch-size 128 --log-interval 1 --dtype float16 2. python3 ../gluon/image_classification.py --dataset dummy --gpus 2 --epochs 10 --mode imperative --model alexnet --batch-size 128 --log-interval 1 --dtype float16 ## What have you tried to solve it? I have no idea how to solve this.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services