cyrusbehr opened a new issue #20600:
URL: https://github.com/apache/incubator-mxnet/issues/20600
I'm using mxnet 1.8.0.
How can I efficiently deal with variable input batch sizes with Mxnet using
the C++ API?
Initially, my inference code looks something like follows:
```
ErrorCode
FaceRecognizerGPU::getFaceFeatureVectors(std::vector<cv::cuda::GpuMat>
&alignedFaceImages, std::vector<Faceprint>& faceprints) {
// auto t1 = Clock::now();
auto data = AsData(alignedFaceImages, ctx);
if (exec != nullptr) {
if (args["data"].GetShape()[0] != alignedFaceImages.size()) {
delete exec;
args["data"] = NDArray(Shape(alignedFaceImages.size(), 3, 112,
112), ctx, false);
exec = net.SimpleBind(
ctx, args, std::map<std::string, NDArray>(),
std::map<std::string, OpReqType>(), auxs);
}
}
data.CopyTo(&(exec->arg_dict()["data"]));
exec->Forward(false);
auto embeddings = exec->outputs[0].Copy(Context(kCPU, 0));
embeddings.WaitToRead();
// Rest of code here....
}
```
The issue with the above is that any time the batch size changes, the
executor is deleted and `SimpleBind` is run with the new input size. Doing so
is a slow operation (as I assume it's allocating GPU memory and doing some
other stuff under the hood), and therefore rapidly switching between batch
sizes becomes quite slow.
Therefore, what I'd like is to use a scheme where the executor is only
deleted and reinstatiated if the batch size increases. If the batch size
decreases, we can use the existing allocated GPU memory and no need to delete
and re-allocate GPU memory (though we would probably need to specify that the
input shape has changed somehow). I know this can be done with TensorRT, but
I'm not sure how exactly to implement this in Mxnet. I was hoping I could do
something as follows:
```
ErrorCode FaceRecognizerGPU::getFaceFeatureVectors(std::vector<cv::Mat>
&alignedFaceImages, std::vector<Faceprint>& faceprints) {
auto data = AsData(alignedFaceImages, ctx);
if (!m_exec || m_exec->arg_dict()["data"].GetShape()[0] <
alignedFaceImages.size()) {
if (m_exec){
delete m_exec;
}
m_args["data"] = NDArray(Shape(alignedFaceImages.size(), 3, 112,
112), ctx, false);
m_exec = m_net.SimpleBind(
ctx, m_args, std::map<std::string, NDArray>(),
std::map<std::string, OpReqType>(), m_auxs);
}
data.CopyTo(&(m_exec->arg_dict()["data"]));
m_exec->Forward(false);
auto embeddings = m_exec->outputs[0].Copy(Context(kCPU, 0));
embeddings.WaitToRead();
// Rest of code here....
}
```
However, it fails when the batch size is decreased b/c the input size
doesn't match what is expected (hence we need to specify somehow that the input
shape has changed, without changing the GPU memory allocation):
```
[14:15:27]
/home/cyrus/work/c-sdks/3rd_party_libs/mxnet/build_cuda_11/packaged/include/mxnet-cpp/operator.hpp:141:
MXNetError: Check failed: assign(&dattr, vec.at(i)): Incompatible attr in node
at 0-th output: expected [46,3,112,112], got [48,3,112,112]
```
Anyone know how this can be done?
I think it may be done with the `mxnet::cpp::Executor::Reshape` function but
I can't find any examples on how it's done.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]