[quote="matt-arm, post:2, topic:6867"]

it’s worth mentioning that one of the reasons all the codegens accept Relay 
rather than TIR is because BYOC is implemented in Relay

[/quote]

I agree with you that the current infrastructure seems to be limited to Relay. 
But tqchen did mention:

[quote="tqchen, post:17, topic:6680"]

we could use the generic pattern language to perform S0 and S1 and still use 
the TIR for further lowering. This kind of compositability is where we are 
heading towards

[/quote]

So we can assume that BYOC is at least planned for the TIR level. Since VTA 
shows the "original" way of coupling an accelerator to TVM. This would mean 
that a BYOC at TIR level could reuse much of the conceptual design used by VTA 
developers to dock onto TVM.

I think the most natural reason why BYOC will always start at the Relay level 
is because it is more often the case that accelerators are designed with the 
"Framework" operators in mind. So that an accelerator will encapsulate the 
execution of what we consider Relay subgraphs.

The bridge between Relay and TIR is the definition of the compute and 
scheduling rule for a given Relay operator/function. So a first step using 
BYOC-TIR:

* the vendor would need to override standard TVM translation between Relay 
operators and their TOPI implementation

What I have described above is, to some extent, what you will find in the 
[TOPI/nn folder in the codebase]( 
https://github.com/apache/incubator-tvm/tree/master/topi/python/topi/nn) and to 
some extent in the [VTA part of the 
stack](https://github.com/apache/incubator-tvm/tree/master/vta/python/vta/top). 

The compute rule needs to have its corresponding scheduling rule at the TE 
level in order to generate TIR description. The TE design is obviously vendor 
specific. This is what you will find in the [TOPI/<target> folder in the 
codebase](https://github.com/apache/incubator-tvm/tree/master/topi/python/topi).
 It would be interesting to discuss if vendor-specific TE extensions can be 
included in the BYOC concept.

For the TIR level, the Relay BYOC concept could be highly reuse (so defining 
TIR patterns which should be matched and so on). AFAIK, TIR pass infra is very 
similar to Relay pass infra and therefore decisions made at the higher level on 
how to "customize" the TIR passes could be adopted at the lower level. To some 
extent, [VTA does 
this](https://github.com/apache/incubator-tvm/blob/master/vta/python/vta/build_module.py#L71).
 My biggest concern is that some of the TIR passes are "triggered" by a [pragma 
injection](https://github.com/apache/incubator-tvm/blob/master/vta/python/vta/transform.py#L274)
 and others aren't. I don't know what the underlying reason was to divide them 
as such.
What is missing, in the BYOC-TIR, is how to incorporate the pattern matching 
mechanism (similar in API has the on at Relay level) in order to ease this 
process (at the TIR level).

Given:

* The accelerator can handle a composite pattern outside the capabilities of 
standard TVM fusion at Relay level. Therefore the vendor designs the composite 
pattern at the Relay level and at compute definition level

* The accelerator can handle a composite pattern outside the capabilities of 
standard TVM at the TIR level (example: DMA load store). Therefore the vendor 
designs the composite pattern at the TIR level using TE

The BYOC-TIR compilation flow could follow this outline:

* Framework is translated into Relay and standard TVM optimizations can be done 
here
    * Let TVM handle front end stuff (part 1 of frontend )

* Relay workload is searched for Relay composite pattern, if found then 
"delete" the original Relay subgraph and "insert" the compute definitions 
designed by the vendor 

    * Insert "your" special way of rewriting Relay graphs (part 2 of frontend)
        * For me this is the, to this date, most common docking onto TVM from 
outside and what is currently available in documentation

     * Create valid Relay graphs and continue compiling in TVM

        * I guess this could be optional, if you want to generate TIR from here


* The compute definitions are lowered using vendor specific usage of standard 
TEs/vendor specific TEs/bypassing TE representation into a TIR representation
    * For example [how VTA does 
it](https://github.com/apache/incubator-tvm/tree/master/vta/python/vta/top)

* The TIR representation is adapted by patterns defined by the vendor (similar 
API to the patternlang available at Relay level), if found then "delete" the 
original TIR subgraph and "insert" the TIR generated/designed by the vendor
    * Insert "your" special way of handling TIR graphs (part 1 of backend)
         * Autotuner available if TE is used to generate these new TIR graphs
        * For example [how VTA does 
it](https://github.com/apache/incubator-tvm/blob/master/vta/python/vta/transform.py)

*  TIR representation is further handled by standard TVM stack (part 2 of 
backend)

    * Array size computation/pointer arithmetic

    * Dead-code elimination

* Continue lowering from TIR-  >your codegen


The interesting thing is: 

* developing only part 2 of frontend and not continuing TVM flow is what, 
current,  BYOC dcoumentation shows (obviously it still needs your runtime) this 
will be the path taken by vendors with high previous in-house development of a 
software stack

* developing part 2 of frontend and part 1 of backend allows for a very 
specific concentrated effort while piggybacking on some core TVM functionality 
across the stakc. This will be the path taken by vendors with low previous 
in-house development/researchers/hobbyist

What are your opinions on this?
@thierry





---
[Visit Topic](https://discuss.tvm.ai/t/byoc-and-the-vta-missing-link/6867/3) to 
respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.ai/email/unsubscribe/f0a4e033a76a1e009d5faff2083bf77dcd3ae26722b274db4eb1b92ce9cd6133).

Reply via email to