altanh commented on issue #8579:
URL: https://github.com/apache/tvm/issues/8579#issuecomment-888686075


   After analysis we believe this regression is due to a combination of changes 
in #8486 and also an inefficient loop in the `check_grad` function:
   
   1. #8486 replaced the previous global (or thread-local) `CompileEngine` with 
a fresh `TECompiler` per `Interpreter` instance. This means that the previous 
behavior of caching lowered functions (for given input types) globally across 
all interpreters no longer holds.
   
   2. in `check_grad` at 
https://github.com/apache/tvm/blob/720e7b1ebd9b789a1100dee7536d0633c7941dd1/python/tvm/relay/testing/__init__.py#L163
 notice that the forward function is being recompiled by the interpreter 
**executor** for **every element of the input tensor**. Important to note here 
that the interpreter executor is actually creating a new `Interpreter` object 
for every evaluated expression (and hence losing the cache) causing a complete 
recompile on each evaluation.
   
   ## Proposed Solution
   The second point can be easily rectified by hoisting the forward function 
evaluation outside the hot loop, as the function doesn't change. In fact it 
only needs to be evaluated once per device and target combination. This will 
immediately fix the extreme regression. A PR will be posted soon.
   
   Regarding the first point, we would like to move away from relying on hidden 
global caching for performance as much as possible as this has caused confusion 
in the past. Thus we will not modify the new interpreter behavior.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to