Lunderberg commented on pull request #39:
URL: https://github.com/apache/tvm-rfcs/pull/39#issuecomment-959696021


   > Usage of te.AXIS_SEPARATOR: It seems this is only used in the API side but 
not in BufferTransform, would be good to get some clarification.
   
   That's correct, the `te.AXIS_SEPARATOR` only appears in the API for the TE 
schedules, and not in the TIR graph generated from the TE schedule.  I've 
updated the RFC with a paragraph on this distinction.
   
   For the axes returned, I'd want to add a third option.
   
   * T2: Using `te.AXIS_SEPARATOR` to separate groups of axes, e.g. `AA = 
s[A].transform_layout(lambda i, j: [j//4, te.AXIS_SEPARATOR, i, j%4])`, 
directly implies fusion of the `i` and `j%4` axes, but exposes each transformed 
axis for use in the TE schedule.  That is, `AA.op.axis` has three elements, 
whose iteration bounds are `[A.shape[1]//4, A.shape[0], 4]`.
   
   Then my thoughts on each option are below, with T2 as my preferred option.
   
   * T0: This is easy to write in te, but any additional manipulation of the 
axis (e.g. changing the loop ordering independent of the data layout) would 
require splitting the returned axis to match the layout transformation.  It 
would also mean that `lambda *indices: indices` isn't the identity function, 
because it would flatten the buffer as part of that transformation.
   * T1: As you mentioned, this would require explicitly writing down the 
transformation to fuse the axes, which would be error-prone.  It would also 
make the default behavior, where all buffer axes are fused into a single index, 
be more of a special case.
   * T2: Doesn't require the user to specify the transformation to fuse the 
axes, and allows the corresponding loops to be easily reorganized.  The fusion 
in the default case, where no `transform_layout` is applied, then follows the 
same pattern, where a `BufferTransform` node is inserted to fuse all remaining 
axes into `num_axis_separators+1 == 1` axes.
   
   
   > From TIR's perspective, the same primitive can be introduced to TIR as 
well. It will be an eager transformation (without needing to create 
BufferTransform node) that rewrite the both producer/consumer immediately upon 
calling it. This is required if we want to further scheduling the transformed 
block. I can follow up implementing it in TIR later
   
   When the transformation is performed, would that include fusing axes 
together?  If it does, then the `FlattenBuffer` pass would need distinguish 
between a 2-d buffer that should be flattened to a flat 1-d buffer, and a 2-d 
buffer that has already been flattened from some larger dimension.  If it 
doesn't, then the `FlattenBuffer` pass would need to distinguish between a N-d 
buffer that should be flattened to a 1-d buffer, and an N-d buffer that should 
be flattened to a 2-d buffer.  In either case, `FlattenBuffer` would need some 
additional metadata to know how the buffer should be handled.
   
   I like the idea of having the transformation be done eagerly, to avoid 
needing to pass through additional information, but I'm not quite seeing how it 
could be done.  (I had been trying an initial implementation that way on the TE 
side.)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to