Lunderberg commented on a change in pull request #39:
URL: https://github.com/apache/tvm-rfcs/pull/39#discussion_r739488316



##########
File path: rfcs/0039-buffer-physical-layout.md
##########
@@ -0,0 +1,522 @@
+- Feature Name: Buffer Physical Layout
+- Start Date: 2021-10-05
+- RFC PR: [apache/tvm-rfcs#0039](https://github.com/apache/tvm-rfcs/pull/0039)
+- GitHub Issue: Not Yet Written
+- Related RFCs: [RFC#0042](https://github.com/apache/tvm-rfcs/pull/0042)
+
+# Summary
+[summary]: #summary
+
+This RFC introduces layout transformations that can be applies to a
+buffer during the lowering process.  These transformations will be
+part of the schedule, allowing the same compute definition to be used
+across multiple different layouts.
+
+[RFC#0042](https://github.com/apache/tvm-rfcs/pull/0042) is intended
+to make these buffer transformations easier to write, though it isn't
+strictly necessary for this change.
+
+# Motivation
+[motivation]: #motivation
+
+Currently, TVM assumes that all buffers can be treated as flat memory.
+That is, while a tensor may have N dimensions, the underlying buffer
+allocated by the low-level codegen has a single value defining the
+size, and access into that buffer is done using a single index.  This
+assumptions holds for most cases, such as a CPU accessing RAM, but
+doesn't hold in all cases.  For example, texture memory on a GPU
+requires two indices to access.
+
+In addition, computations that are semantically identical (e.g. 2-d
+convolution) require independent compute definitions and schedules
+(e.g. `conv2d_nchw` and `conv2d_hwcn`) based on the format of the data
+accepted as input.
+
+This RFC introduces a mechanism to specify transformations to be
+applied to the layout of buffers in memory, including the option to
+present multi-dimensional indices to the low-level code generators.
+This will allow for target-specific handling of non-flat memory, and
+will allow for code re-use across compute definitions that differ only
+in memory layout.
+
+# Guide-level explanation
+[guide-level-explanation]: #guide-level-explanation
+
+A buffer is represented by a `tvm::tir::Buffer` object, and has some
+shape associated with it.  This shape is initially defined from the
+buffer's shape in the compute definition.  Buffers can either be
+allocated within a `tvm::tir::PrimFunc` using a `tvm::tir::Allocate`
+node, or can be passed in as parameters to a `PrimFunc`.  Buffer
+access is done using `tvm::tir::BufferLoad` and
+`tvm::tir::BufferStore` for reads and writes, respectively.
+
+When a TIR graph is passed into the low-level code generator
+`tvm::codegen::Build`, the dimensionality of each buffer must be
+supported by the target code generator.  Typically, this will mean
+generating a 1-dimensional index representing access into flat memory.
+Some code generators may attach alternative semantics for
+multi-dimensional buffers (e.g. 2-d buffers to represent texture
+memory on OpenCL).  A low-level code generator should check the
+dimensionality of the buffers it is acting on, and give a diagnostic
+error for unsupported dimensionality.
+
+To define the layout transformation in a TE schedule, use the
+`transform_layout` method of a schedule, as shown below.  The
+arguments to `transform_layout` is a function that accepts a list of
+`tvm.tir.Var` representing a logical index, and outputs a list of
+`tvm.tir.PrimExpr` giving a corresponding physical index.  If
+`transform_layout` isn't called, then no additional layout
+transformations are applied.
+
+For example, below defines the reordering from NHWC logical layout to
+NCHWc physical layout.  Similar to `cache_read` and `cache_write`, the
+`transform_layout` method introduces a new stage in the schedule.
+
+```python
+# Compute definition, written in terms of NHWC logical axes
+B = te.compute(A.shape, lambda n,h,w,c: A[n,h,w,c])
+s = te.create_schedule(B.op)
+
+def nhwc_to_nchwc(logical_axes):
+    n,h,w,c = logical_axes
+    return [n, c//4, h, w, c%4]
+
+B_nchwc = s[B].transform_layout(nhwc_to_nchwc)
+
+# Compute definition that would produce an equivalent physical layout
+B_equivalent = te.compute(
+    [A.shape[0], A.shape[3]//4, A.shape[1], A.shape[2], 4],
+    lambda n, c_outer, h, w, c_inner: A[n, h, w, 4*c_outer+c_inner],
+)
+```
+
+By default, after all explicitly specified layout transformations are
+applied, all axes are flattened to a single physical axis by following

Review comment:
       Thank you, and updated.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to