[GitHub] [tvm-rfcs] Lunderberg commented on pull request #42: [TIR] Adding BufferPointer node, used by BufferLoad and BufferStore

GitBox Fri, 05 Nov 2021 09:00:53 -0700


Lunderberg commented on pull request #42:
URL: https://github.com/apache/tvm-rfcs/pull/42#issuecomment-962016078



   Following a conversation with @vinx13, below are the overall points of 
discussions.
   
   - The improvement to writability/readability aren't as universal as I had 
previously thought.  It improves some use cases, is neutral to others, but 
there are some transforms that are made somewhat worse by using `BufferAccess`. 
Examples that we found of each are shown below.
     
     - `PrimFuncSpecializer` is made much simpler by using `BufferAccess`.  
This is generally the case for transforms that replace buffers/indices.
     
       <details>
         <summary>Current implementation (click to expand)</summary>
         
         ```c++
         Stmt VisitStmt_(const BufferStoreNode* op) final {
           Stmt stmt = StmtExprMutator::VisitStmt_(op);
           op = stmt.as<BufferStoreNode>();
           ICHECK(op != nullptr);
           auto it = buffer_map_.find(op->buffer);
           if (it == buffer_map_.end()) {
             return GetRef<BufferStore>(op);
           } else {
             auto n = CopyOnWrite(op);
             n->buffer = it->second;
             return Stmt(n);
           }
         }
         
         PrimExpr VisitExpr_(const BufferLoadNode* op) final {
           PrimExpr expr = StmtExprMutator::VisitExpr_(op);
           op = expr.as<BufferLoadNode>();
           ICHECK(op != nullptr);
           auto it = buffer_map_.find(op->buffer);
           if (it == buffer_map_.end()) {
             return GetRef<BufferLoad>(op);
           } else {
             auto n = make_object<BufferLoadNode>(*op);
             n->buffer = it->second;
             return PrimExpr(n);
           }
         }
         ```
       </details>
     
       <details>
         <summary>Cleaner implementation, current TIR with helper function 
(click to expand)</summary>
         
         ```c++
         Stmt VisitStmt_(const BufferStoreNode* op) final {
           BufferStore store = 
Downcast<BufferStore>(StmtExprMutator::VisitStmt_(op));
         
           auto writer = store.CopyOnWrite();
           ModifyBuffer(writer->buffer);
         
           return std::move(store);
         }
         
         PrimExpr VisitStmt_(const BufferLoadNode* op) final {
           BufferLoad load = 
Downcast<BufferLoad>(StmtExprMutator::VisitExpr_(op));
         
           auto writer = load.CopyOnWrite();
           ModifyBuffer(writer->buffer);
         
           return std::move(load);
         }
         
         void ModifyBuffer(Buffer& buffer) {
           auto it = buffer_map_.find(buffer);
           if (it != buffer_map_.end()) {
             buffer = it->second;
           }
         }
         ```
       </details>
       
       <details>
         <summary>Cleanest implementation, uses `BufferAccess` (click to 
expand)</summary>
         
         ```c++
         BufferAccess VisitExpr_(const BufferAccessNode* op) final {
           BufferAccess access = 
Downcast<BufferAccess>(StmtExprMutator::VisitExpr_(op));
         
           auto it = buffer_map_.find(op->buffer);
           if (it != buffer_map_.end()) {
             auto writer = access.CopyOnWrite();
             writer->buffer = it->second;
           }
         
           return Downcast<PrimExpr>(access);
         }
         ```
       </details>
     
     - `ComputeInliner` is largely unchanged by using `BufferAccess`.  This is 
largely the case for transforms that read the buffers/indices, then perform 
some replacement to remove the buffer use.
     
       <details>
         <summary> Current implementation (click to expand)</summary>
         
         ```c++
         PrimExpr VisitExpr_(const BufferLoadNode* _load) final {
           BufferLoad load = 
Downcast<BufferLoad>(StmtExprMutator::VisitExpr_(_load));
           if (!load->buffer.same_as(inlined_buffer_)) {
             return std::move(load);
           }
           return ReplaceInlinedBuffer(std::move(load));
         }
         
         PrimExpr ReplaceInlinedBuffer(BufferLoad load) {
           SetIndexSubstitution(load->indices);
           return Substitute(inlined_store_->value, idx_sub_);
         }
         ```
       </details>
       
       <details>
         <summary> Implementation with `BufferAccess` (click to 
expand)</summary>
         
         ```c++
         PrimExpr VisitExpr_(const BufferLoadNode* _load) final {
           BufferLoad load = 
Downcast<BufferLoad>(StmtExprMutator::VisitExpr_(_load));
           if (!load->access->buffer.same_as(inlined_buffer_)) {
             return std::move(load);
           }
           return ReplaceInlinedBuffer(std::move(load));
         }
         
         PrimExpr ReplaceInlinedBuffer(BufferLoad load) {
           SetIndexSubstitution(load->access->indices);
           return Substitute(inlined_store_->value, idx_sub_);
         }
         ```
       </details>
       
     - `CacheReadRewriter` is made worse by use of `BufferAccess`.  This is 
largely the case when a transform interacts with only loads, or only stores.
       
       <details>
         <summary> Current implementation (click to expand)</summary>
         
         ```c++
         PrimExpr VisitExpr_(const BufferLoadNode* load) final {
           if (load->buffer.same_as(info_->read_buffer)) {
             ObjectPtr<BufferLoadNode> n = make_object<BufferLoadNode>(*load);
             n->buffer = info_->write_buffer;
             return PrimExpr(n);
           }
           return ExprMutator::VisitExpr_(load);
         }
         ```
       </details>
       
       <details>
         <summary> Visiting `BufferAccess` (click to expand)</summary>
         
         ```c++
         // Need to make sure that we only rewrite access that occurs within
         // BufferLoad, which was implicit in the current implementation from
         // only visiting BufferLoad.
         
         Stmt VisitStmt_(const BufferStoreNode* op) final {
           ICHECK(!op->access->buffer.same_as(info_->read_buffer))
               << "Unexpected write to cache_read buffer " << 
op->access->buffer->name;
           return StmtExprMutator::VisitStmt_(op);
         }
         
         PrimExpr VisitExpr_(const BufferAccessNode* op) final {
           auto access = 
Downcast<BufferAccess>(StmtExprMutator::VisitExpr_(op));
           if (access->buffer.same_as(info_->read_buffer)) {
             auto writer = access.CopyOnWrite();
             writer->buffer = info_->write_buffer;
           }
           return std::move(access);
         }
         ```
       </details>
       
       <details>
         <summary> Rewrite `BufferAccess` while visiting `BufferLoad` (click to 
expand)</summary>
         
         ```c++
         PrimExpr VisitExpr_(const BufferLoadNode* op) final {
           auto load = Downcast<BufferLoad>(StmtExprMutator::VisitExpr_(op));
           
           if (load->access->buffer.same_as(info_->read_buffer)) {
             // Need to rewrite the `BufferAccess` before we can rewrite the 
`BufferLoad`.
             BufferAccess access = load->access;
             auto access_writer = access.CopyOnWrite();
             access_writer->buffer = info_->write_buffer;
     
             auto load_writer = load.CopyOnWrite();
             load_writer->access = access;
           }
           return std::move(load);
         }
         ```
       </details>
     
     
   - The `BufferAccess` object would have the potential to replace the 
`tvm_access_ptr`, which is used to generate a pointer into a buffer, which can 
then be passed to hardware intrinsic functions.
   
     - Pros
       - Layout transformations that occur on graphs including a 
`tvm_access_ptr` do not need special handling.
       - Could replace the built-in call with a TIR node, which would be easier 
to access from other transformations.
         
     - Cons
       - TIR graphs that use `tvm_access_ptr` must conform to the layout that 
is used by the hardware intrinsic, and probably shouldn't have layout 
transformations applied to it.
       - Would introduce pointers as a first-class concept in TIR.
   
   
   Overall, our conclusion was that since the ergonomics of `BufferAccess` can 
make some types of transforms worse to read, there isn't a strong argument to 
add it in.  The [specific 
commit](https://github.com/apache/tvm/pull/9391/commits/6ac92751137bd73e4483b04ee486e79ef62a9322)
 that simplified transforms that could be simplified with `BufferAccess` 
identifies passes where a helper function could simplify the pass itself.  We'd 
like to wait a few weeks to see additional arguments in favor of introducing 
`BufferAccess` come up, but otherwise would break the tie in favor of not 
changing the existing TIR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [tvm-rfcs] Lunderberg commented on pull request #42: [TIR] Adding BufferPointer node, used by BufferLoad and BufferStore

Reply via email to