Lunderberg commented on PR #77:
URL: https://github.com/apache/tvm-rfcs/pull/77#issuecomment-1163436177

   > Indeed it is important to avoid having a separate compute definition for 
each workload on a new target. In this particular case, all computation 
definition would start with the original layout. Then there is a "schedule 
transformation" like transform layout which will generate the new stage as part 
of the scheduling process.
   
   Thank you, and that is roughly how I'm seeing it as well.  That everything 
starts with the base compute definition and is modified from there.  If I 
understand correctly, the main differences are below.
   
   * Option A: Layout transformations of inputs are allowed, but only during 
initial graph-level optimization.  When optimizing an individual PrimFunc, 
layout transformations of inputs and outputs are not allowed.
   
   * Option B: Layout transformations of inputs and outputs are not allowed.  
If this is desired, it should be done by first introducing a cache stage in 
TIR, then transforming the layout of the cache, and finally by a graph-level 
transformation that inspects each PrimFunc and hoists the cache stage out.
   
   > The particular stage can be marked, which contains effectively the same 
information as BufferConstraint, except that it does not introduce new data 
structures. During global layout reflowing, such information can be used to 
guide the reflowing to reconstruct a data structure like BufferConstraint or 
other Layout mappings and use that to serve the same purpose.
   
   So long as the constraints can be statically searched for, this approach 
makes sense to me.  I would be more concerned about adding additional semantics 
to existing nodes, such as a AttrStmt node, since it then requires passes to be 
aware not only of the existence of the constraint, but also that it must be 
reconstructed from the existing data structure.  This approach would make it 
much more difficult for a static analysis tool to identify locations where the 
constraints must be updated.
   
   As a way to potentially find a way forward, what if we start by implementing 
pad values only for buffers that are allocated internally to a function?  This 
would be allowed behavior under both Option A and Option B, and would help 
determine how difficult reconstruction of the constraints would be from the 
transformation block without any additional annotation.  This could help 
motivate whether additional annotations are necessary, regardless of whether 
they are stored alongside the Buffer itself or in a separate 
attribute/annotation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to