Lunderberg commented on issue #17357:
URL: https://github.com/apache/tvm/issues/17357#issuecomment-2341048341

   This is a bit of a bug and a bit of an ordering dependency.
   
   1. The `LambdaLift` pass extracts local lambda functions into the module.  
However, `FuseOps` and `FuseOpsByPattern` use local lambda functions to 
represent functions that will be replaced with specific kernel invocations.
   2. The `AllocateWorkspace` pass adds a new `workspace` parameter to all 
top-level functions that have a `"WorkspaceSize"` attribute, and updates all 
other functions to provide the new workspace   However, if there is a call from 
one function with a `"WorkspaceSize"` attribute to another such function, it 
gets left with a dangling `GlobalVar`.
   3. The `FuseTIR` pass 
   
   There's a couple of options for short-term fixes, and a couple of options 
for long-term fixes.
   
   * Short-term
       * After calling `FuseOpsByPattern` and 
`relax.backend.contrib.cutlass.annotate_workspace`, immediately call 
`AllocateWorkspace`.  This ensures that the annotations are correct at the time 
when they are used.
       * After calling `AllocateWorkspace`, immediately call 
`relax.transform.RunCodegen`.  This ensures that the local lambda function is 
present when the cutlass codegen looks for it.
   
   * Medium-term
       * Update `LambdaLift` to ignore lambda functions that have the 
`tvm::relax::attr::kPrimitive` attribute.  This would prevent it from lifting 
out a function that is intended for use by `RunCodegen`.  This would work, but 
would be additional cross-talk between otherwise unrelated transforms.
       * Update the way in which `AllocateWorkspace` locates functions to be 
updated, with a top-down approach rather than bottom-up.  Instead of first 
updating the functions that require a workspace and then updating their 
callers, it would start at the externally-exposed functions, and walk along the 
Relax call graph to find callees that should be updated.  This would ensure 
that all caller/callee pairs are updated at the same time, preventing dangling 
pointers.
   
   * Long-term
       * Update `FuseOpsByPattern` to include the workspace.  After 
`FuseOpsByPattern`, the workspace would be expressed explicitly as an 
allocation, and the `"WorkspaceSize"` attribute would never be generated.  The 
various workspaces would then be replaced by a single workspace in the 
`StaticPlanBlockMemory` pass.
   
   Unfortunately, I don't have time to implement the medium/long term solutions 
at the moment, but could help guide somebody in their implementation if there's 
interest.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to