The limitation is perhaps more with TE/TIR than it is TOPI, in that currently *all* the scheduling decisions need to happen together. The proposed changes with TensorIR would lift that constraint, but for it to actually be useful the FuseOps pass would have to become a TIR pass rather than a Relay one, otherwise the graph is already split up before we even get the chance to schedule it.
I did consider whether Ansor may be a good fit for this, however there seemed to be a few main issues. Firstly, for us this technique is mostly to reduce peak memory requirements (however it can also be beneficial to performance from reduction in bandwidth). Ansor currently singularly optimizes for performance, whereas instead we'd want to keep a pareto-frontier of performance/memory usage. Second, we'd prefer not to have to use auto-tuning approaches except where absolutely necessary. Finally, this involves considering many different ways to break a graph down into sets of cascaded subgraphs but my understanding is that Ansor does (or maybe will) only use a fixed set of rules to create the subgraphs. Perhaps there are solutions to some of these problems, but I currently envisioned that the cascading would break the graph into these interleaved 'sub-ops' (acting on sub-tensors) and Ansor could then subsequently optimise each of these for performance. --- [Visit Topic](https://discuss.tvm.apache.org/t/rfc-cascade-scheduling/8119/6) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/d71be1908efcf439ca81ab584b7904097af3709355eb2a08c39a4750483bd06e).