IvyGongoogle commented on issue #14159: [Feature Request] Support fp16 for c++ api URL: https://github.com/apache/incubator-mxnet/issues/14159#issuecomment-483493526 I modify the `src/c_api/c_predict_api.cc` in [L211](https://github.com/apache/incubator-mxnet/blob/master/src/c_api/c_predict_api.cc#L211 ) to: ``` std::vector<NDArray> arg_arrays, aux_arrays; for (size_t i = 0; i < arg_shapes.size(); ++i) { if (arg_params.count(arg_names[i]) != 0) { NDArray nd = NDArray(arg_shapes[i], ctx,false,arg_params[arg_names[i]].dtype()); CopyFromTo(arg_params[arg_names[i]], &nd); arg_arrays.push_back(nd); } else { NDArray nd = NDArray(arg_shapes[i], ctx); arg_arrays.push_back(nd); } } for (size_t i = 0; i < aux_shapes.size(); ++i) { if (aux_params.count(aux_names[i]) != 0) { NDArray nd = NDArray(aux_shapes[i], ctx,false,aux_params[aux_names[i]].dtype()); CopyFromTo(aux_params[aux_names[i]], &nd); aux_arrays.push_back(nd); } else { NDArray nd = NDArray(aux_shapes[i], ctx); aux_arrays.push_back(nd); } } ``` And I can successfully infer this trained fp16 model using C++ api, but I find the speed is same with the fp32 when I infer my cnn or ocr(lstm) model. So why is the fp16 not faster than fp32? Or maybe my modification about `src/c_api/c_predict_api.cc` above does not really work?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
