tqchen edited a comment on issue #9022:
URL: https://github.com/apache/tvm/issues/9022#issuecomment-920896447
So in the above post I tried to summarize the state. Now let me try to share
some of my thoughts based on the summary.
First of all, R0 and R1 are not that different in nature. Both tries to
introduce two separate scopes that brings different behavior. The main
questions boils down to how can we name the "global" scope.
Per allocate semantics, we treats "global" as normal CPU memory which can
come from stack or platform specific allocation. The system can choose the best
way of doing such lowering. Always lowering to TBAW is indeed more general for
the need of N1. However, the need N0 would favor stack allocation when
possible. Note that we will likely need a related behavior for micro devices as
well when generating operator kernels.
While it is OK to differentiate stack allocated memory from a platform
specific one, doing so would bring additional burden to the user and would
require significant refactor of the operator implementations.
The main requests so far comes from need of N1. In that case, it would be
easy for AOT generator to allocate memory with special
tags("global.workspace"), that enforces workspace allocation since in this
setting there is a single expected behavior.
So my suggestion would be R1+R2, as it helps to resolve the need in a way
that is compatible with the current semantics and usecases. It will also open
doors for more future scope dependent optimizations
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]