masahi commented on code in PR #11341:
URL: https://github.com/apache/tvm/pull/11341#discussion_r876334023
##########
python/tvm/relay/op/strategy/generic.py:
##########
@@ -1760,6 +1769,44 @@ def cumsum_strategy(attrs, inputs, out_type, target):
return strategy
+@override_native_generic_func("concat_strategy")
+def concatenate_strategy(attrs, inputs, out_type, target):
+ """concatenate generic strategy"""
+ strategy = _op.OpStrategy()
+ strategy.add_implementation(
+ wrap_compute_concat(topi.concatenate),
+ wrap_topi_schedule(topi.generic.schedule_extern),
+ name="concatenate",
+ )
+ return strategy
+
+
+@concatenate_strategy.register(["cpu"])
+def concatenate_strategy_cpu(attrs, inputs, out_type, target):
Review Comment:
Move it to `strategy/x86.py`
##########
src/te/schedule/schedule_dataflow_rewrite.cc:
##########
@@ -525,8 +542,13 @@ void InjectInline(ScheduleNode* sch, bool
feature_extraction_mode) {
for (auto iv : compute->axis) {
args.push_back(iv->var);
}
+ if (ext_ops.find(stage->op) != ext_ops.end()) {
+ // sshtin: The extern op can try to get access to the input tensors
as a row data,
+ // that can lead to error in TE scripts.
+ stage->attach_type = kGroupRoot;
+ continue;
+ }
Review Comment:
So the new implementation of concat is implemented via IRBuilder, hence it
is an `extern` op. The "op pattern" of an extern op is supposed to be `kOpaque`
for fusion, but since the existing implementation is of kind `kInjective` and
@shtinsa wants to preserve op fusion as before, he came up with the hack above.
thoughts? @tqchen
##########
python/tvm/relay/op/strategy/generic.py:
##########
@@ -1760,6 +1769,44 @@ def cumsum_strategy(attrs, inputs, out_type, target):
return strategy
+@override_native_generic_func("concat_strategy")
+def concatenate_strategy(attrs, inputs, out_type, target):
+ """concatenate generic strategy"""
+ strategy = _op.OpStrategy()
+ strategy.add_implementation(
+ wrap_compute_concat(topi.concatenate),
+ wrap_topi_schedule(topi.generic.schedule_extern),
+ name="concatenate",
+ )
+ return strategy
+
+
+@concatenate_strategy.register(["cpu"])
+def concatenate_strategy_cpu(attrs, inputs, out_type, target):
+ """concatenate x86 strategy"""
+ strategy = _op.OpStrategy()
+ use_old_concat = False
+ for inpt in inputs:
+ shape = inpt.shape
+ for i in shape:
+ if not isinstance(i, tir.expr.IntImm):
+ use_old_concat = True
+ break
+ if use_old_concat:
+ strategy.add_implementation(
+ wrap_compute_concat(topi.transform.concatenate),
+ wrap_topi_schedule(topi.x86.injective.schedule_concatenate),
+ name="concatenate.generic",
+ )
+ else:
+ strategy.add_implementation(
+ wrap_compute_concat(topi.x86.concatenate),
+ wrap_topi_schedule(topi.x86.schedule_concatenate_cpu),
+ name="concatenate.cpu",
+ )
Review Comment:
It looks like you want to use the new impl by default unless the input has
dynamic shape. Have you benchmarked that the new impl is always faster?
A better way, as we discussed before, is to let autotvm pick the better one
based on tuning results. You need to add two implementations to the strategy,
and choose the default one. That shouldn't be a controversial change.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]