zhiics commented on a change in pull request #6655:
URL: https://github.com/apache/tvm/pull/6655#discussion_r530535123
##########
File path: src/relay/transforms/annotate_target.cc
##########
@@ -128,14 +143,31 @@ class AnnotateTargetRewriter : public ExprRewriter {
* \return An annotated and target-propagated relay expression.
*/
Expr new_expr = expr;
- if (op_expr_to_target_.find(expr) != op_expr_to_target_.end() &&
FreeVars(expr).size() != 0) {
- new_expr = InsertAnnotation(expr, op_expr_to_target_[expr], make_end_op);
- op_expr_to_target_[new_expr] = op_expr_to_target_[expr];
+ const CallNode* call = expr.as<CallNode>();
+ if (op_expr_to_target_.find(expr) != op_expr_to_target_.end()) {
+ // Check whether expr has args, if not - do not insert compiler_end.
+ if (expr->IsInstance<RefWriteNode>() ||
expr->IsInstance<RefCreateNode>() ||
+ expr->IsInstance<RefReadNode>() || expr->IsInstance<TupleNode>() ||
Review comment:
There would be more nodes, like constructors. But I am still concerned
if this changed is needed. This really makes this already complicated pass more
complicated. I still don't see a good point why we don't run
mergecompilerregions. That would solve this problem. Without running it, we
would have a large number of small segments, which requires frequent data
transfer between the host and device as well as frequent kernel launch.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]