dcslin edited a comment on pull request #792: URL: https://github.com/apache/singa/pull/792#issuecomment-688418118
current result: - [x] training with fp16 ok, with graph, comparable accuracy - [x] tensor cuda backend generic support on fp16 with broadcast - [ ] review operations resuing float32 ``` root@1c6aaef3db53:~/singa-hp2# PYTHONPATH=build/python/ python3 examples/cnn/train_cnn.py mlp mnist -m5 Starting Epoch 0: Training loss = 446.399231, training accuracy = 0.870331 Evaluation accuracy = 0.922676, Elapsed Time = 4.054065s Starting Epoch 1: Training loss = 246.745819, training accuracy = 0.926194 Evaluation accuracy = 0.938301, Elapsed Time = 3.921566s Starting Epoch 2: Training loss = 201.893021, training accuracy = 0.939384 Evaluation accuracy = 0.944611, Elapsed Time = 3.735095s Starting Epoch 3: Training loss = 171.419769, training accuracy = 0.948289 Evaluation accuracy = 0.952524, Elapsed Time = 3.625971s Starting Epoch 4: Training loss = 149.009338, training accuracy = 0.955326 Evaluation accuracy = 0.956530, Elapsed Time = 3.582685s root@1c6aaef3db53:~/singa-hp2# PYTHONPATH=build/python/ python3 examples/cnn/train_cnn.py mlp mnist -m5 -pfloat16 Starting Epoch 0: Training loss = 447.799744, training accuracy = 0.869547 Evaluation accuracy = 0.922075, Elapsed Time = 3.899604s Starting Epoch 1: Training loss = 249.704956, training accuracy = 0.925110 Evaluation accuracy = 0.937300, Elapsed Time = 2.524199s Starting Epoch 2: Training loss = 206.520721, training accuracy = 0.938334 Evaluation accuracy = 0.942809, Elapsed Time = 2.410751s Starting Epoch 3: Training loss = 177.916901, training accuracy = 0.946538 Evaluation accuracy = 0.950120, Elapsed Time = 2.390487s Starting Epoch 4: Training loss = 157.046936, training accuracy = 0.952958 Evaluation accuracy = 0.954828, Elapsed Time = 2.396067s root@1c6aaef3db53:~/singa-hp2# PYTHONPATH=build/python/ python3 examples/cnn/train_cnn.py cnn mnist -m5 -pfloat32 Starting Epoch 0: Training loss = 596.964600, training accuracy = 0.789421 Evaluation accuracy = 0.943209, Elapsed Time = 7.073203s Starting Epoch 1: Training loss = 234.664322, training accuracy = 0.920758 Evaluation accuracy = 0.960036, Elapsed Time = 6.908865s Starting Epoch 2: Training loss = 165.501694, training accuracy = 0.944454 Evaluation accuracy = 0.971254, Elapsed Time = 6.795328s Starting Epoch 3: Training loss = 138.790848, training accuracy = 0.953559 Evaluation accuracy = 0.968950, Elapsed Time = 6.864943s Starting Epoch 4: Training loss = 119.547195, training accuracy = 0.959595 Evaluation accuracy = 0.970553, Elapsed Time = 10.432533s root@1c6aaef3db53:~/singa-hp2# PYTHONPATH=build/python/ python3 examples/cnn/train_cnn.py cnn mnist -m5 -pfloat16 Starting Epoch 0: Training loss = 598.742554, training accuracy = 0.752268 Evaluation accuracy = 0.941506, Elapsed Time = 13.717912s Starting Epoch 1: Training loss = 238.977264, training accuracy = 0.875350 Evaluation accuracy = 0.958934, Elapsed Time = 14.170568s Starting Epoch 2: Training loss = 169.415573, training accuracy = 0.898046 Evaluation accuracy = 0.969151, Elapsed Time = 13.457300s Starting Epoch 3: Training loss = 142.731216, training accuracy = 0.905600 Evaluation accuracy = 0.968850, Elapsed Time = 13.270982s Starting Epoch 4: Training loss = 121.980347, training accuracy = 0.911153 Evaluation accuracy = 0.971254, Elapsed Time = 9.463192s ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
