mkolod commented on a change in pull request #11325: [MXNET-703] TensorRT
runtime integration
URL: https://github.com/apache/incubator-mxnet/pull/11325#discussion_r205894176
##########
File path: src/executor/graph_executor.cc
##########
@@ -941,6 +970,114 @@ void GraphExecutor::FinishInitGraph(nnvm::Symbol symbol,
this->InitOpSegs();
}
+/*!
+ * \brief This function is triggered after each tensorrt subgraph replacement
pass.
+ * Reset arguments of GraphExecutor::Init(...) as some variables (weights and
biases)
+ * are absorbed into the TRT engine it also it rerun attributes inferences
accordingly
+ * to the new topology.
+ */
+Graph GraphExecutor::ReinitGraph(Graph&& g, const Context &default_ctx,
+ const std::map<std::string, Context> &ctx_map,
+ std::vector<Context> *in_arg_ctxes,
+ std::vector<Context> *arg_grad_ctxes,
+ std::vector<Context> *aux_state_ctxes,
+ std::vector<OpReqType> *grad_req_types,
+ std::unordered_map<std::string, TShape>
*arg_shape_map,
+ std::unordered_map<std::string, int>
*arg_dtype_map,
+ std::unordered_map<std::string, int>
*arg_stype_map,
+ std::unordered_map<std::string, NDArray>
*params_map) {
+ std::unordered_set<std::string> to_remove_params;
+ for (auto& el : *params_map) {
+ to_remove_params.insert(el.first);
+ }
+
+ DFSVisit(g.outputs, [&to_remove_params](const nnvm::NodePtr n) {
+ to_remove_params.erase(n->attrs.name);
+ });
+
+ for (auto& el : to_remove_params) {
+ params_map->erase(el);
+ arg_shape_map->erase(el);
+ arg_dtype_map->erase(el);
+ arg_stype_map->erase(el);
+ }
+ const auto &idx = g.indexed_graph();
+ num_forward_inputs_ = idx.input_nodes().size();
+ in_arg_ctxes->resize(num_forward_inputs_ - idx.mutable_input_nodes().size());
Review comment:
@zhengda I think it can, but we couldn't get it to work so far, due to the
bind() method for module not taking in the shared_buffer, which is necessary
for TensorRT engine builder to bake in the weights, which is something that
TensorRT requires. Regarding the graph rewrite, note that this is taking place
very early on in the bind process. There is shape inference hapening before the
rewrite, but no memory allocation, etc., so I think from a data parallel
perspective, it should work because the resource allocation isn't done before
the rewrite, but after. Also, after the graph rewrite, shapes are determined
again, so the bind process follows after the rewrite as if there were no
rewrite.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services