Lunderberg commented on issue #17357:
URL: https://github.com/apache/tvm/issues/17357#issuecomment-2341048341
This is a bit of a bug and a bit of an ordering dependency.
1. The `LambdaLift` pass extracts local lambda functions into the module.
However, `FuseOps` and `FuseOpsByPattern` use local lambda functions to
represent functions that will be replaced with specific kernel invocations.
2. The `AllocateWorkspace` pass adds a new `workspace` parameter to all
top-level functions that have a `"WorkspaceSize"` attribute, and updates all
other functions to provide the new workspace However, if there is a call from
one function with a `"WorkspaceSize"` attribute to another such function, it
gets left with a dangling `GlobalVar`.
3. The `FuseTIR` pass
There's a couple of options for short-term fixes, and a couple of options
for long-term fixes.
* Short-term
* After calling `FuseOpsByPattern` and
`relax.backend.contrib.cutlass.annotate_workspace`, immediately call
`AllocateWorkspace`. This ensures that the annotations are correct at the time
when they are used.
* After calling `AllocateWorkspace`, immediately call
`relax.transform.RunCodegen`. This ensures that the local lambda function is
present when the cutlass codegen looks for it.
* Medium-term
* Update `LambdaLift` to ignore lambda functions that have the
`tvm::relax::attr::kPrimitive` attribute. This would prevent it from lifting
out a function that is intended for use by `RunCodegen`. This would work, but
would be additional cross-talk between otherwise unrelated transforms.
* Update the way in which `AllocateWorkspace` locates functions to be
updated, with a top-down approach rather than bottom-up. Instead of first
updating the functions that require a workspace and then updating their
callers, it would start at the externally-exposed functions, and walk along the
Relax call graph to find callees that should be updated. This would ensure
that all caller/callee pairs are updated at the same time, preventing dangling
pointers.
* Long-term
* Update `FuseOpsByPattern` to include the workspace. After
`FuseOpsByPattern`, the workspace would be expressed explicitly as an
allocation, and the `"WorkspaceSize"` attribute would never be generated. The
various workspaces would then be replaced by a single workspace in the
`StaticPlanBlockMemory` pass.
Unfortunately, I don't have time to implement the medium/long term solutions
at the moment, but could help guide somebody in their implementation if there's
interest.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]