Lunderberg commented on PR #77:
URL: https://github.com/apache/tvm-rfcs/pull/77#issuecomment-1162392893
> Introducing changes to TIR would needs some additional thoughts that
deserves some extra consideration. Due to the N*M complexity (where N is the
TIR possibilities and M is the number of primitives to be supported) that needs
to be handled in implementation (by backend implementers and primitive
implementers)
This was part of the design consideration, to minimize the impact of the
proposed changes to primitives, lowering transformations, and backends.
* The `BufferConstraint` annotations do not need specific handling at the
codegen level, as it is only present to enable compile-time optimizations.
* Use of the `BufferConstraint` hints would occur within existing utilities,
primarily as additional information available in `arith::Analyzer` utilities.
This minimizes the need for other primitives/transforms to be aware of the
buffer constraints, while still benefiting from them.
* The `T.undef()` built-in does not need specific handling at the codegen
level, as it is removed during lowering.
* The `T.undef()` built-in does not require specific handling from other
primitives, as stores of `T.undef()` can be treated the same as stores of any
other value.
> Right now it is possible to do non-local constraint rewriting flowings as
part of the graph pass. Note that while E1 is indeed less "compact" on one
hand, we can use it to reconstruct the desirable compact data
structure(something like BufferConstraint that represents the layout mapping)
that we can use to flow the decisions across the graph node during the pass.
I definitely agree that graph-level transforms are where the layouts and
constraints should be decided. The `BufferConstraint` annotations are not
intended as a way to override in TIR what was already decided at the graph
level, but rather a way to communicate to TIR transformations what has been
decided at the graph level.
> E1: Composing a stage that transforms the layout(a loop that represents
the mapping)
I'm still a bit confused with this approach, specifically how one would
avoid having a separate compute definition for each workload on a new target
(Initially brought up by @csullivan
[here](https://github.com/apache/tvm-rfcs/pull/77#discussion_r893701372).) In
my mind, if I'm going to compose a layout transformation stage, it would need
to be followed by a compute stage that takes a transformed layout as input. So
rather than having a single conv2d that can be generalized over layouts, each
transformed layout would still need to have a compute stage for it.
> Note that intiially such data structure do not need to live beyond the
life of a pass, because they can be reconstructed at anytime from the other
representation.
How would this be represented while optimizing the performance of a
subgraph? My concern would be how to express the non-local constraints while
keeping a small search space for optimization.
* Ensure that the producer and consumer stages are within the same subgraph.
Since the constraints provided to a consumer depend not only on the producer,
but also on the constraints provided to the producer, so this might require
fusing the entire end-to-end model into a single monolithic kernel.
My understanding is that this would result in a search space that is too
large to effectively optimize, though I haven't explicitly tested it.
* Insert a transformation stage into the subgraph, in which the constraint
is written. Later portions of the subgraph could then rely on the constraint
without examining other subgraphs.
Would need to have some way to indicate that the transformation stage
shouldn't be altered during optimization, nor should it be part of the
performance timing.
* Express the graph-level constraints to a subgraph, so that it can optimize
using those constraints.
This was my intent with the `BufferConstraint` annotations, since then the
subgraphs could take advantage of
> E1 also enables some additional capabilities (e.g.) expressing future
memory remappings that do not necessarily fit into padding/packing.
Is there an existing annotation to indicate that a stage should be removed
entirely during lowering? That might be an effective way to allow more general
usage by annotating a stage that can be assumed to have been performed prior to
the subgraph. This would be a way to express the second option of an extra
transformation stage, while still providing enough information to remove the
transformation stage during lowering.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]