[GitHub] [incubator-mxnet] cyrusbehr opened a new issue #20600: How to efficiently run inference with variable batch sizes with mxnet using C++ API?

GitBox Wed, 22 Sep 2021 14:35:01 -0700


cyrusbehr opened a new issue #20600:
URL: https://github.com/apache/incubator-mxnet/issues/20600



   I'm using mxnet 1.8.0.
   
   How can I efficiently deal with variable input batch sizes with Mxnet using 
the C++ API? 
   
   Initially, my inference code looks something like follows: 
   ```
   ErrorCode 
FaceRecognizerGPU::getFaceFeatureVectors(std::vector<cv::cuda::GpuMat> 
&alignedFaceImages, std::vector<Faceprint>& faceprints) {
       // auto t1 = Clock::now();
       auto data = AsData(alignedFaceImages, ctx);
   
       if (exec != nullptr) {
           if (args["data"].GetShape()[0] != alignedFaceImages.size()) {
               delete exec;
   
               args["data"] = NDArray(Shape(alignedFaceImages.size(), 3, 112, 
112), ctx, false);
   
               exec = net.SimpleBind(
               ctx, args, std::map<std::string, NDArray>(),
               std::map<std::string, OpReqType>(), auxs);
           }
       }
   
       data.CopyTo(&(exec->arg_dict()["data"]));
   
       exec->Forward(false);
   
       auto embeddings = exec->outputs[0].Copy(Context(kCPU, 0));
       embeddings.WaitToRead();
       
       // Rest of code here....
   }
   ```
   
   The issue with the above is that any time the batch size changes, the 
executor is deleted and `SimpleBind` is run with the new input size. Doing so 
is a slow operation (as I assume it's allocating GPU memory and doing some 
other stuff under the hood), and therefore rapidly switching between batch 
sizes becomes quite slow.
   
   Therefore, what I'd like is to use a scheme where the executor is only 
deleted and reinstatiated if the batch size increases. If the batch size 
decreases, we can use the existing allocated GPU memory and no need to delete 
and re-allocate GPU memory (though we would probably need to specify that the 
input shape has changed somehow). I know this can be done with TensorRT, but 
I'm not sure how exactly to implement this in Mxnet. I was hoping I could do 
something as follows:
   
   ```
   ErrorCode FaceRecognizerGPU::getFaceFeatureVectors(std::vector<cv::Mat> 
&alignedFaceImages, std::vector<Faceprint>& faceprints) {
       auto data = AsData(alignedFaceImages, ctx);
   
       if (!m_exec || m_exec->arg_dict()["data"].GetShape()[0] < 
alignedFaceImages.size()) {
           if (m_exec){
               delete m_exec;
           }
   
           m_args["data"] = NDArray(Shape(alignedFaceImages.size(), 3, 112, 
112), ctx, false);
   
           m_exec = m_net.SimpleBind(
                   ctx, m_args, std::map<std::string, NDArray>(),
                   std::map<std::string, OpReqType>(), m_auxs);
   
       }
   
       data.CopyTo(&(m_exec->arg_dict()["data"]));
   
       m_exec->Forward(false);
       auto embeddings = m_exec->outputs[0].Copy(Context(kCPU, 0));
       embeddings.WaitToRead();
       // Rest of code here....
   }
   
   ``` 
   
   However, it fails when the batch size is decreased b/c the input size 
doesn't match what is expected (hence we need to specify somehow that the input 
shape has changed, without changing the GPU memory allocation):
   ```
   
   [14:15:27] 
/home/cyrus/work/c-sdks/3rd_party_libs/mxnet/build_cuda_11/packaged/include/mxnet-cpp/operator.hpp:141:
 MXNetError: Check failed: assign(&dattr, vec.at(i)): Incompatible attr in node 
 at 0-th output: expected [46,3,112,112], got [48,3,112,112]
   ```
   
   Anyone know how this can be done? 
   I think it may be done with the `mxnet::cpp::Executor::Reshape` function but 
I can't find any examples on how it's done. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [incubator-mxnet] cyrusbehr opened a new issue #20600: How to efficiently run inference with variable batch sizes with mxnet using C++ API?

Reply via email to