I think it has things to do with the complexity we want to manage in the 
low-level. While it can be attractive to put a whole graph in a single TIR 
block, it can inevitably increase the amount of effort to support scheduling 
for such kind of blocks.

The relay represrentation is also useful to form high-level knowledges, e.g. 
cut a subgraph and use winograd to implement conv2d which is not available in 
TIR. 

I think we should still rely on the partitioning of the subfunctions in relay 
level, but improve the flexibility of such partitioner so that we could get out 
patterns like deptwise-conv -> conv2d and even full graph if necessary.

As we can see are two extreme point of design:
- D0: Relying everything on TIR (good for low level decisions)
- D1: Relying mostly on relay for each single operator (good for high-level 
decisions).

Both points have their weakness. Our current solution is somewhere in the 
middle, by relying relay to get a subfunction that feeds into TIR.  I believe 
the most important thing is to build solution that can freely move between D0 
and D1.  Additionally, it might be helpful to expose the TIR level info to the 
graph level, e.g. use the compute decl of an op to derive its semantics and 
expose them in the graph level.

In this case, having the ability to customize the fusor (via pattern language) 
would opens up doors for such move. Such a blended solution is likely going to 
be more pratically.





---
[Visit Topic](https://discuss.tvm.apache.org/t/rfc-cascade-scheduling/8119/8) 
to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/05944d28a1f27d1fde12f53adf81ce53743cea7d832b6d172cdfa5726b34faf0).

Reply via email to