The limitation is perhaps more with TE/TIR than it is TOPI, in that currently 
*all* the scheduling decisions need to happen together. The proposed changes 
with TensorIR would lift that constraint, but for it to actually be useful the 
FuseOps pass would have to become a TIR pass rather than a Relay one, otherwise 
the graph is already split up before we even get the chance to schedule it.

I did consider whether Ansor may be a good fit for this, however there seemed 
to be a few main issues. Firstly, for us this technique is mostly to reduce 
peak memory requirements (however it can also be beneficial to performance from 
reduction in bandwidth). Ansor currently singularly optimizes for performance, 
whereas instead we'd want to keep a pareto-frontier of performance/memory 
usage. Second, we'd prefer not to have to use auto-tuning approaches except 
where absolutely necessary. Finally, this involves considering many different 
ways to break a graph down into sets of cascaded subgraphs but my understanding 
is that Ansor does (or maybe will) only use a fixed set of rules to create the 
subgraphs.

Perhaps there are solutions to some of these problems, but I currently 
envisioned that the cascading would break the graph into these interleaved 
'sub-ops' (acting on sub-tensors) and Ansor could then subsequently optimise 
each of these for performance.





---
[Visit Topic](https://discuss.tvm.apache.org/t/rfc-cascade-scheduling/8119/6) 
to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/d71be1908efcf439ca81ab584b7904097af3709355eb2a08c39a4750483bd06e).

Reply via email to