I completely agree with breaking down into primitive ops. Even the `relay.op.qnn` should be broken down into primitive ops. If the primitive op does not exist, we will discuss and maybe create one. I understand the Relay fusion part. I am trying to make another point.
I am trying to understand when to directly translate to primitive ops OR create a new `qnn` op that will be later lowered to primitive ops using a relay pass. If the lowering sequence is very long, it might be better to create a new `qnn` op. PS - The first Relay pass that we can run is qlower or qrewrite (can be a part of framework parser as well, if it looks ugly in build_module) and the resulting sequence will only be a sequence of existing relay primitive ops. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/issues/2351#issuecomment-507082002