areusch commented on a change in pull request #7785:
URL: https://github.com/apache/tvm/pull/7785#discussion_r614264731
##########
File path: src/target/source/codegen_c_host.cc
##########
@@ -40,13 +40,16 @@ namespace codegen {
CodeGenCHost::CodeGenCHost() { module_name_ =
GetUniqueName("__tvm_module_ctx"); }
-void CodeGenCHost::Init(bool output_ssa, bool emit_asserts, std::string
target_str) {
+void CodeGenCHost::Init(bool output_ssa, bool emit_asserts, bool
is_aot_executor,
+ std::string target_str) {
Review comment:
>> it might be harder for us to handle device functions that does not
directly corresponds to a symbol
>Could you give an example for such function?
I think @tqchen is referring to e.g. functions which offload compute to
devices. This would be e.g. functions produced by a non-`llvm` or `c` backend,
so BYOC for µTVM or CUDA, Vulkan, etc. on traditional OS.
The thing is, as I mentioned in the discuss forum post, these functions are
actually launched from the `target_host` e.g. CPU by calling a closure returned
from `GetFunction`. On traditional OS, this `GetFunction` + closure serves to
both program the accelerator and launch the compute (split is unspecified
between the two, but typically `GetFunction` does the programming).
In micro-land, I don't recommend we adopt any sort of closures, and I also
think that it's likely that the "programming" part will happen as part of
compilation or as part of the Module initialization. So I'd recommend we assume
that we'll move the "accelerator programming" step _at least_ into some
Module-init function. Or it may even be handled outside of the executor. Given
that, `GetFunction` is essentially a no-op, and it makes sense to me to presume
that the PackedFunc impl will launch and block on accelerated compute in these
cases. As we move to handling asynchronous execution, we may need to modify
this interface (but that will likely amount to a "launch" and a "sync"
function, so not too different).
>> Given the potential need to have a different path on the embedded
setting(to avoid string lookup and use symbol), we should clarify and document
what is the CRT standard so that the new code generator can refer to
>So, about this, for me the "runtime" of the graph is a triplet of:
>
>Calling convention (registry, dynamic or plain)
>Backend functions (platform specific vs defined in TVM)
>Graph execution (aot vs graph)
Runtime has been pretty overloaded in TVM, so given the recent PR to rename
to GraphExecutor, I prefer we just use "runtime" to refer to `c_runtime_api.h`
and `c_backend_api.h` implementation and dependencies of those. GraphExecutor
is built on top of that, as should AOTExecutor be, so I think "runtime" should
exclude Executor.
I agree it's going to be an impossibly-long PR if we try to fit in the full
general AOT integration, and don't think we should do that. I suggest that
here, we just implement the logic needed to produce the top-level TIR function
and test it, and then consider things like changes to runtime memory management
in follow-on PRs. This might be going a bit too far backwards, but e.g. if it
were to simplify things, we could even switch to emitting
`tir.call_packed_lowered` in AOT codegen and then implement the direct call as
a follow-on PR.
>So, I am not sure we can come up in day1 with a full general aot
integration. I would say :
>
>Let's stick with the current AOT definition for now (which is for sure more
general that we had in mind at the >beginning :) )
>I will remove the is_aot from the codegen, but I will use a more generic
use_plain_calling_convention
>Let's document the exact definition of CRT and AOT and let's try to
generalize later
I think this makes sense--let's clarify a couple points:
- definition of AOT: just the piece that generates the top-level TIR
function to serve as an Executor
- definition of CRT: `c_runtime_api.h` and `c_backend_api.h` implementation
and dependencies of those
Now as to what we should put in this PR to move forward--the rest of this
comment is my proposal, feel free to debate.
I'd suggest we just limit this PR to the top-level TIR function which serves
as part of the Executor implementation (e.g. `Executor#Run`). That would mean
deferring:
- Memory management changes--just use `TVMBackendAllocWorkspace` in this PR
even for intermediate graph tensors
- firmware-facing API--just enough to make test cases for the top-level TIR
function run
- calling convention: can use what you guys have put together here, or can
even emit `tir.call_packed_lowered` nodes if that's easier. I think they are
about equivalent, aside that `call_packed_lowered` uses
`TVMBackendGetFuncFromEnv`.
Near-future directions (e.g. stuff implemented here I am not saying we
should drop; just move to next PR):
- Memory management: let's discuss this soon and implement the change as a
GraphPlanMemory change. That should get us to a solution that works for all
executors and is extensible should users wish to implement their own allocation
scheme. I do not view the existing page-based allocator as particularly fit for
production standalone inference.
- PackedFunc invocation from other PackedFunc: let's discuss this with an
eye to both BYOC and `c`/`llvm` modules, and come up with an efficient solution
that makes sense for a pure-C world. This mostly doesn't impact the
firmware-facing API, so let's initially discuss this separately.
- firmware-facing API: Let's also discuss this, either as part of your
initial RFC or as a new thread.
Would this make sense to you guys as a way forward? Happy to discuss other
options too. I do think it'd be better to merge a part of this PR and move
forward with the others, rather than create one giant PR and discuss all of the
pieces at once. At the same time, I do realize we should ensure we're working
towards a vision that makes sense for microcontrollers, which involves many
pieces given how different the deployment environment is from traditional OS.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]