areusch commented on pull request #34:
URL: https://github.com/apache/tvm-rfcs/pull/34#issuecomment-925204958
okay I've done a bit more research on this one. Here is the thing causing
tension here:
- at present, `tir.call_packed` codegens to `TVMFuncCall` and `TVMFuncCall`
does not in fact accept `resource_handle` as an argument. it is not possible to
supply `resource_handle` to `tir.call_packed`.
- however, that is not actually the ultimate goal of this PR. the ultimate
goal of this PR is to supply `resource_handle` to `tir.call_cpacked`, which
doesn't use `TVMFuncCall` and instead invokes the `TVMBackendPackedCFunc`
directly.
- this basically works in a very narrow set of use cases which meet all of
these conditions:
a) when we don't need `TVMFuncCall` because we can get
`TVMBackendPackedCFunc*` directly
b) when we know that we have the `resource_handle` that PackedFunc
expects.
c) when the C interface API to AOT is used and therefore AOT Executor
has a `void* context` to be supplied to device kernel launch PackedFunc
This really has quite a narrow use case: in the C runtime with the AOT
executor and when the C interface API is used. Other use cases disqualify this
usage pattern:
- If the PackedFunc API is used we currently have no good way to pass the
proposed `TVM<Model>_Devices` struct into AOTExecutor. We could pass it as
OpaqueHandle if we had to.
- If the C++ runtime is used, in _most_ cases we cannot get the
`TVMBackendPackedCFunc*` directly and must operate with the opaque
`TVMFunctionHandle` and `TVMFuncCall`
- It is possible that within a given compiled AOT module, we could employ
`tir.call_cpacked` to refer to other DSO-exportable functions directly.
However, this wouldn't necessarily extend to non-DSO-exportable functions,
which in any case typically gain the `context` closure from `sptr_to_self`
and/or a `this` capture. Also, doing so would hit the linker limitations
discussed in
https://discuss.tvm.apache.org/t/introduce-artifact-a-container-for-generated-code/11099/2.
Taken together, this means we need the following restrictions on this usage
pattern:
1. `-runtime` must be `c` due to the final bullet point above.
2. `-executor` must be `aot`.
3. `-unpacked-api` must be `0`. When `-unpacked-api` is `1`, we just append
this to the extern call args, correct?
4. `-interface-api` is likely `c`. As discussed above, this _could_ work
with `packed`, but the awkward OpaqueHandle dtype must be used.
Considering this altogether, this is essentially an attempt to simplify the
multi-`Module` late-bound import system for use with resource-constrained
embedded devices. The key piece of missing information needed without such a
system is the `context` pointer. I see two routes forward here:
1. We add `resource_handle` to `tir.call_cpacked` and resolve the
incongruity between usage for a device kernel function and the current usage in
the C Runtime as `Module*`.
2. We just lower the extern schedule to a CallNode whose `args` includes an
OpaqueHandle at a position which the driver is aware of. For instance,
following the current suggestion, such OpaqueHandle would be the final arg. The
question here is how to indicate to the executor that such argument should be
filled from the `TVMDevices` struct. This could be done either as an attr of
the CallNode or by introducing a new `dtype` (I sort of favor the former, as
`void*` is exactly OpaqueHandle).
Since ultimately the driver must be written knowing where the "context" is
going to be (either as `this` member or as `resource_handle`), the driver must
also be designed around this choice. If portability between C and C++ runtimes
is desired, it will need to choose an approach that works with both. While I
appreciate that `Module` is generally overly bloated for many embedded
applications, we do rely on it for the purposes of autotvm and providing the
TVM RPC server. Moving away from it seems like it would add significant
cognitive overhead to the design of microTVM. Given we may need
`resource_handle` to be the `Module*` should `TVMBackendGetFuncFromEnv` need to
be implemented on microTVM (admittedly, this means that the Module import
mechanism is in use which is even more bloated), it kind of seems like we
should avoid trying to repurpose `resource_handle` here and just adopt route 2
above.
Ultimately at tension here is `TVMBackendGetFuncFromEnv` and the need to
provide a less bloated `context` for use with the TVM C runtime. I'm not sold
on this opinion, just writing down my thoughts. Perhaps others could weigh in
here. Supporting Module's import mechanism on the C runtime is indeed
considerable extra complexity for questionable benefit. And, this RFC is part
of a larger solution which supplants the need for Module import. Perhaps the
outcome of the Article RFC (e.g. decision on what the long-term expectations of
the compiler are on the module load process) also play in here as well.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]