wrongtest opened a new pull request #9372:
URL: https://github.com/apache/tvm/pull/9372


   Hi there~ This PR aims to enhance the `CompactBufferRegion` and 
`GetBlockAccessRegion` for conditional buffer accesses.
   
   Currently, they are not aware of conditions, thus may produce non-tight 
region bound. Take crop as an example:
   ```python
   with T.block() as []:
       B = T.alloc_buffer((18), dtypes="float32")
       for i in range(0, 20):
           with T.block():
               T.evaluate(T.if_then_else(2 <= i and i < 18, B[i - 2], 0.0, 
dtype="float32"))
   ```
   The compact buffer pass would infer that the accessed region of B is 
B[-2:18] and re-allocate new buffer with size=18-(-2)=20, since it do not know 
B is only accessed if i in [2, 18).
   
   To take conditions into consideration, the PR make several changes as 
described below:
   - Relax the buffer region just at the access point
   
       If consider conditions, the domain map used to relax the buffer accesses 
is no longer uniform. Thus we can not simply collect all inner buffer accesses 
at block visit point and relax them all with `iter_dom_map_on_post_order_`.
   
       As a workaround, the PR instead record the allocation point (index into 
`ancestor_loops_`) for each buffer, thus at buffer access visit point, it can 
know which loop vars should not relax (generally loops out of allocation 
scope). The function `VisitBufferAccess()` do not need to record the access 
onto stack, but relax and union the access region immediately, with a global 
maintained `dom_map_` aware of condition bounds. 
   
      - origin:  (1) visit block begin -> ... -> (2) visit inner buffer access 
(record access) -> ... ->  (3) visit block end (collect and relax all accesses 
in scope use same  `iter_dom_map_on_post_order_`) 
      - new:  (1) visit block begin -> ... -> (2) visit inner buffer access 
(relax and union with current `dom_map_`, take extra cost to exclude 
non-relaxed loop vars from dom_map_) -> ... ->  (3) visit block end
   
   - Implement visit logic of `tir.IfThenElse` and `tir.if_then_else()` call, 
and update var bounds deduced from condition in different branches.
   
   - Implement intset difference util func. If the global intset is A and 
deduced intset is B on condition, then the bounded intset on true branch is 
`Intersect(A, B)` and the bounded intset on false branch is `Difference(A, B)`.
   
   - Clear read/write annotation for non-opaque block in pass 
`ConvertBlocksToOpaque` (maybe illness)
   
        `ConvertBlocksToOpaque` is the pass before `CompactBufferRegion`. Since 
block read/write annotations are non-conditional, the conditional access info 
will get lost when block with point access is converted.
       ```python
       for i in range(20):
            with T.block():
                T.reads([B[i-2]])
                T.evaluate(T.if_then_else(2 <= i and i < 18, B[i - 2], 0.0, 
dtype="float32"))
       ```
       For a buffer allocated out of block scope, the compact buffer pass 
currently do not look into detailed access in block scope but use block's 
annotations, thus still try to relax B[i - 2] without condition awareness. The 
PR try to overcome this problem by differentiate two circumstances:
       - (1) the block is opaque with reads/writes annotations:  treat the 
block as opaque as before, just use annotations.
       - (2) the block is opaque but reads/writes are empty: treat the block as 
"transparent" and try visit buffer accesses within block.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to