This is an automated email from the ASF dual-hosted git repository.
areusch pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm-rfcs.git
The following commit(s) were added to refs/heads/main by this push:
new 87ff1fa [RFC] Introducing DeclBuffer (#70)
87ff1fa is described below
commit 87ff1facd55c0a7cef45efdf2b7548ee299e8e06
Author: Wuwei Lin <[email protected]>
AuthorDate: Thu Jun 9 18:03:20 2022 -0700
[RFC] Introducing DeclBuffer (#70)
* [RFC] Introducing DeclBuffer
Co-authored-by: Eric Lunderberg <[email protected]>
* Update 0070-introducing-decl-buffer.md
* Update 0070-introducing-decl-buffer.md
* Update 0070-introducing-decl-buffer.md
* Update 0070-introducing-decl-buffer.md
* Update 0070-introducing-decl-buffer.md
Co-authored-by: Eric Lunderberg <[email protected]>
---
rfcs/0070-introducing-decl-buffer.md | 230 +++++++++++++++++++++++++++++++++++
1 file changed, 230 insertions(+)
diff --git a/rfcs/0070-introducing-decl-buffer.md
b/rfcs/0070-introducing-decl-buffer.md
new file mode 100644
index 0000000..6ada730
--- /dev/null
+++ b/rfcs/0070-introducing-decl-buffer.md
@@ -0,0 +1,230 @@
+- Feature Name: introducing-decl-buffer
+- Author: Wuwei Lin (@vinx13), Eric Lunderberg (@Lunderberg)
+- Start Date: 2022-05-04
+- RFC PR: [apache/tvm-rfcs#0000](https://github.com/apache/tvm-rfcs/pull/70)
+- GitHub Issue: https://github.com/apache/tvm/issues/11627
+
+# Summary
+[summary]: #summary
+
+This is a follow-up of https://github.com/apache/tvm/pull/9727 and
+[RFC#63](https://github.com/apache/tvm-rfcs/pull/63). Currently buffer can be
implicitly
+declared and then used. The implicit behavior can be error prone and makes
analysis more difficult.
+This RFC introduces `DeclBuffer`, a new IR construct as an explicit statement
for buffer declaration.
+
+# Motivation
+[motivation]: #motivation
+
+Currently a Buffer object can be created and then referenced in TIR, without
explicit declaration
+or allocation. For example, in TVM script, one can use `T.buffer_decl` to
create a new buffer and
+then use it in the rest of the program.
+```
[email protected]_func
+def buffer_alias(A: T.Buffer[(16,), "float"]):
+ A_vector = T.buffer_decl([4], "float32x4", data=A.data)
+ T.evaluate(A_vector[0]) # read from buffer alias
+```
+However, `T.buffer_decl` doesn’t translate to a node in AST. The AST will be
+```
+PrimFunc {
+ buffer_map: {A_data: Buffer(data=A_data, ...)},
+ body: Evaluate {
+ BufferLoad {
+ buffer: Buffer(data = A.data, [4], "float32x4") # implicit creation of
new buffer
+ index: [0]
+ }
+ }
+}
+```
+In this example, `BufferLoad` loads from an implicitly-created new buffer
which aliases another
+buffer. This example shows that a data variable can be used to create a buffer
in arbitrary ways.
+There are no guarantee that the created buffer and the underlying data
variable have consistent
+physical memory. This makes analysis in TIR difficult and error-prone as one
should always check
+whether a buffer in TIR is an implicitly-created one.
+
+By introducing explicit `DeclBuffer` statement, we can require that a buffer
must always be declared
+before any usages. This makes the creation and the usage of buffer
better-managed within TIR.
+Developers (e.g pass writers) can collect buffer information such as
allocation, aliasing by
+visiting `DeclBuffer` nodes.
+
+# Guide-level explanation
+[guide-level-explanation]: #guide-level-explanation
+
+`DeclBuffer` will be defined as
+```
+class DeclBuffer : public Stmt {
+ Buffer buffer; // the buffer declared
+ Stmt body; // the scope of the buffer
+};
+```
+
+In TVM script, `T.buffer_decl` will be renamed to `T.decl_buffer` to make the
name a verb phase that
+is consistent with the existing ones such as `T.alloc_buffer`,
`T.match_buffer`. `T.decl_buffer`
+will be translated to a `DeclBuffer` object in TIR. This only changes the way
parser handles
+`T.decl_buffer`, the user API of `T.decl_buffer` in TVM script will stay the
same.
+
+In TIR, `DeclBuffer` will be handled in `StmtFunctor`. Visitors or mutators of
`DeclBuffer` can be
+override to handle `DeclBuffer` in TIR passes.
+
+# Reference-level explanation
+[reference-level-explanation]: #reference-level-explanation
+
+## Allocation of intermediate buffer
+The intermediate buffer inside `PrimFunc` can be declared and allocated in the
following way:
+
+```
+Allocate {
+ data: A_data{Var(data = ..., )},
+ extent: ...,
+ body: DeclBuffer {
+ buffer: Buffer(data=A_data, dtype=..., shape=...),
+ body: {
+ ...
+ }
+ }
+}
+```
+This can also be represented in TVMScript:
+```
+A_data = T.allocate(shape=..., dtype=...)
+A = T.decl_buffer(data=A_data)
+```
+
+## Declaration of buffer alias
+Buffer declared in `DeclBuffer` can reuse data variable from another buffer.
This creates a buffer
+alias.
+
+```
+DeclBuffer {
+ buffer: A(data=Var(name=...), dtype=..., shape=...),
+ body: {
+ DeclBuffer {
+ buffer: A_alias(data=A.data, ...)
+ body: ...
+ }
+ }
+}
+```
+
+## Replace `preflattened_buffer_map` with buffer alias
+
+Currently, `PrimFunc` has two maps, `preflattened_buffer_map` and
`buffer_map`, to specify the input
+buffer shapes. Before the flattening passes (`FlattenBuffer` and
`StorageFlatten`),
+`preflattened_buffer_map` is empty and `buffer_map` contains the logical
shapes of the buffers.
+After flattening, the logical shapes are moved to `preflattened_buffer_map`,
and `buffer_map` will
+store the physical shapes of the buffers. The change of the information stored
in `buffer_map` can
+be confusing. These two maps can be unified into a single `buffer_map` that
defines the logical
+shapes of the input buffers. The buffer access in physical shape, which is an
internal behavior of
+`PrimFunc` after flattening, can be achieved by using `DeclBuffer` to create
buffer aliases in
+physical shapes.
+
+This is illustrated in the example below.
+
+Before flattening:
+```
[email protected]_func
+def elemwise(A: T.Buffer[(16, 16), "float32"], C: T.Buffer[(16, 16),
"float32"]):
+ for i, j in T.grid(16, 16):
+ C[i, j] = A[i, j]
+```
+
+After flattening:
+```
[email protected]_func
+def elemwise(A: T.Buffer[(16, 16), "float32"], C: T.Buffer[(16, 16),
"float32"]):
+ A_flattened = T.decl_buffer(A.data, (256,), "float32")
+ C_flattened = T.decl_buffer(C.data, (256,), "float32")
+ for i, j in T.grid(16, 16):
+ C_flattened[i * 16 + j] = A[i * 16 + j]
+```
+
+Specifically, the updated flow of buffer flattening using `DeclBuffer` will be:
+1. Before `FlattenBuffer/StorageFlatten`: Buffers are declared in the
`buffer_map`, and are not flattened. Buffer access is done using N-d
unflattened indices.
+2. After `FlattenBuffer/StorageFlatten`, but before `MakePackedAPI`: Buffers
are declared in the `buffer_map`, and are not flattened. Buffer access is done
through a buffer alias explicitly created via `DeclBuffer`, where the alias
shares the same data pointer, but has a flattened shape and is accessed with
flattened indices.
+3. After `MakePackedAPI`: The `buffer_map` is empty. Necessary information
such as shapes, strides, of the unflattened buffers, will become `AssertStmt`
in the IR, but the unflattened buffers will be no longer accessible.
Declarations of flattened buffers are done using the handles extracted using
+`tvm_struct_get`. It will use explicit `DeclBuffer` to mark the use of the
`T.handle` in the function parameters. These flattened buffers are accessed
+with flattened indices.
+
+## TVM script updates
+* `T.allocate` will return data variable instead of a buffer. If the
subsequent program need to access
+the data variable as a buffer, it should use `T.decl_buffer` to declare the
buffer.
+* `T.buffer_decl` will be renamed to `T.decl_buffer`.
+
+## TIR validation
+With `DeclBuffer` introduced, we can implement utilities for TIR validation.
It will enforce that:
+* No implicit buffer declaration. In lowered TIR, buffers must be defined
explicitly via `DeclBuffer`.
+* No undefined buffer. Buffer in `DeclBuffer` must have been allocated, that
is, the data variable
+of the buffer must be from the function parameters, `AllocateNode`, alias of
other buffers, or from
+the return value of other functions (*).
+
+(*) Note: After `MakePackedAPI`, the backing buffers are the return value of
`@tir.tvm_struct_get`.
+It could also be an entirely separate function call, such as `data:
T.Ptr[T.int32] = T.call_extern("device_specific_malloc", 1024, dtype="handle")`.
+## Engineering plan
+This RFC introduces a TIR change that may require significant refactor to the
existing codebase.
+It can be decomposed into three parts to reduce a pull request size.
+
+- Part 1: Introduce `DeclBuffer` data structure, add corresponding visitors in
IR functors.
+- Part 2: Refactor existing passes and test cases to use `DeclBuffer`.
+- Part 3: Enforce the usage of `DeclBuffer`. No implicit buffer declarations
are allowed.
+
+# Rationale and alternatives
+In S-TIR, there is an alternative to define buffer declarations inside the
block, similar to the
+existing alloc_buffers, match_buffers:
+
+```
+class Block : public Stmt {
+ /*! \brief The buffer allocated in the block. */
+ Array<Buffer> alloc_buffers;
+ /*! \brief The match buffer regions. */
+ Array<MatchBufferRegion> match_buffers;
+ /*! \brief The buffer declared in the block. */
+ Array<Buffer> decl_buffers;
+};
+```
+This unifies the scope of `DeclBuffer` with the block scope. In low-level TIR,
a `DeclBuffer`
+statement is still needed because Block is not available in low-level TIR.
This is similar to the
+current status that `block->alloc_buffers` is lowered to Allocate. For now
since there are no needs
+of `DeclBuffer` during TIR scheduling, we would like to avoid introducing
`block->decl_buffers` to
+keep it simple. It can be an incremental work upon this when future needs come
up.
+
+Another option would be to separate the concepts of memory allocation and
buffer access.
+A memory allocation would represent the allocation of some number of bytes,
and would always use
+physical shape. Each buffer would have a backing allocation, and would
represent access into some
+tensor, and would use logical/transformed shape. Overall, it would be the
difference between having
+one "real" buffer and multiple aliases, as opposed to having several buffers,
and a memory
+allocation backing them, emphasizing that there’s nothing special about the
first buffer. We decided
+this isn’t necessary, because it would add way more boilerplate for the most
common case of one
+buffer, and would encourage people to make buffer aliases when not necessary.
+
+# Drawbacks
+The scope of the buffer in `DeclBuffer` is declared as `body` field. It adds
level of recursion in
+TIR visitors. Since the number of buffers declared inside a `PrimFunc` is
usually small, this is
+unlikely a concern.
+
+# Prior art
+[prior-art]: #prior-art
+
+Buffer declaration is implicitly supported prior to this RFC. In TVM script,
`T.buffer_decl` is used
+to declare a buffer, which can be in other TIR expressions and/or statements.
This RFC is intended
+to formalize this process by using explicit `DeclBuffer` statement.
+
+# Unresolved questions
+[unresolved-questions]: #unresolved-questions
+
+Should low-level code generators handle buffer aliases? One option would be
to remove them in a
+lowering pass. Another option would be to use them to represent explicit type
casts, rather than
+having any implicit typecasts.
+
+When `DeclBuffer` creates a buffer alias, what are the requirements (`shape`,
`dtype`,
+`elem_offset`, etc.) of the aliasing buffer? The current behavior of the
implicit buffer aliasing
+is to assume the aliasing buffer is valid, and rely on codegen to handle
buffer aliases.
+
+# Future possibilities
+[future-possibilities]: #future-possibilities
+
+With explicit `DeclBuffer` statement in TIR, we can introduce analysis passes
for buffer aliasing.
+This will help the existing TIR passes to explicitly examine whether their
assumption on buffer
+aliasing are satisfied.
+
+After this RFC, in the lowered TIR, we need to use two separate statements,
`T.allocate` and `T.decl_buffer` to allocate a buffer data pointer and then
declare the buffer. In the future, we can consider providing syntax sugar to
allow `T.allocate` to return a buffer. This would require some investigation
how we should achieve TVMScript - TIR bidirectional translation.
+