re "we might have to create a long sequence of existing Relay ops to approximate the FP32 computation".
This is certainly a problem for traditional frameworks, but won't be a problem for tvm/relay. Because we has automatic fusion and code generation, the long sequence of ops will be fused again into a single fused op. We can generate code as efficient, sometimes even more efficient(because we can fuse different ops together). So I will always recommend breaking things down to primitive ops if possible. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/issues/2351#issuecomment-507081221