yzh119 opened a new pull request #10420:
URL: https://github.com/apache/tvm/pull/10420


   Previously, we can not bind the loop `i`/`j` to any data-parallel physical 
threads because `outer` is neither determined as `CompleteBlock` nor 
`ReductionBlock`:
   
   1. `outer` writes and reads `b` simultaneously so it's not a complete block.
   2. `outer` has no `init` sub-block so it's not a reduction block.
   
   ```python
   @T.prim_func
   def nested_block_bind(a_ptr: T.handle, b_ptr: T.handle):
       a = T.match_buffer(a_ptr, [16, 16, 16, 16], "float32")
       b = T.match_buffer(b_ptr, [16, 16, 16], "float32")
       for i, j in T.grid(16, 16):
           with T.block("outer"):
               vi, vj = T.axis.remap("SS", [i, j])
               for k, l in T.grid(16, 16):
                   with T.block("inner"):
                       vk, vl = T.axis.remap("SR", [k, l])
                       with T.init():
                           b[vi, vj, vk] = 0.0
                       b[vi, vj, vk] = b[vi, vj, vk] + a[vi, vj, vk, vl]
   ```
   
   In this PR I changed the rule of determining compact dataflow: we 
recursively find the innermost dominant sub-block and determine whether it's 
compact or not. The rule of finding innermost dominant sub-block is as 
following:
   
   ```
   func innermost_dominant_sub_block(block):
       if not all_block_vars_data_parallel(block):
           return block
       if has_more_then_one_child_block(block):
           return block
       child_block = get_child_block(block)
       if not is_dominant(child_block):
           return block
       return func_innermost_dominant_sub_block(child_block)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to