tqchen commented on issue #9022:
URL: https://github.com/apache/tvm/issues/9022#issuecomment-920896447
So in the above post I tried to summarize the state. Now let me try to share
some of my thoughts based on the summary.
First of all, R0 and R1 are not that different in nature. Both tries to
introduce
two separate scopes that brings different behavior. The main questions boils
down
to how can we name the "global" scope.
Per allocate semantics, we treats "global" as normal CPU memory which can
come
from stack or platform specific allocation. The system can choose the best
way
of doing such lowering. However, memory that is accessible from NPU is
something
that is more specialized and could use a special memory tag for
differentiation
purposes.
While it is OK to differentiate stack allocated memory from a platform
specific one,
doing so would bring additional burden to the user and would require
significant
refactor of the operator implementations.
Note that we will likely need a related behavior for micro devices as well
in the need of N0. The main requests so far comes from need of N1. In that
case,
it would be easy for AOT generator to allocate memory with special
tags("global.workspace"),
that enforces workspace allocation since in this setting there is a single
expected behavior.
So my suggestion would be R1+R2, as it helps to resolve the need in a way
that is compatible
with the current semantics and usecases. It will also open doors for more
future scope dependent optimizations
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]