wrongtest opened a new pull request #9372:
URL: https://github.com/apache/tvm/pull/9372
Hi there~ This PR aims to enhance the `CompactBufferRegion` and
`GetBlockAccessRegion` for conditional buffer accesses.
Currently, they are not aware of conditions, thus may produce non-tight
region bound. Take crop as an example:
```python
with T.block() as []:
B = T.alloc_buffer((18), dtypes="float32")
for i in range(0, 20):
with T.block():
T.evaluate(T.if_then_else(2 <= i and i < 18, B[i - 2], 0.0,
dtype="float32"))
```
The compact buffer pass would infer that the accessed region of B is
B[-2:18] and re-allocate new buffer with size=18-(-2)=20, since it do not know
B is only accessed if i in [2, 18).
To take conditions into consideration, the PR make several changes as
described below:
- Relax the buffer region just at the access point
If consider conditions, the domain map used to relax the buffer accesses
is no longer uniform. Thus we can not simply collect all inner buffer accesses
at block visit point and relax them all with `iter_dom_map_on_post_order_`.
As a workaround, the PR instead record the allocation point (index into
`ancestor_loops_`) for each buffer, thus at buffer access visit point, it can
know which loop vars should not relax (generally loops out of allocation
scope). The function `VisitBufferAccess()` do not need to record the access
onto stack, but relax and union the access region immediately, with a global
maintained `dom_map_` aware of condition bounds.
- origin: (1) visit block begin -> ... -> (2) visit inner buffer access
(record access) -> ... -> (3) visit block end (collect and relax all accesses
in scope use same `iter_dom_map_on_post_order_`)
- new: (1) visit block begin -> ... -> (2) visit inner buffer access
(relax and union with current `dom_map_`, take extra cost to exclude
non-relaxed loop vars from dom_map_) -> ... -> (3) visit block end
- Implement visit logic of `tir.IfThenElse` and `tir.if_then_else()` call,
and update var bounds deduced from condition in different branches.
- Implement intset difference util func. If the global intset is A and
deduced intset is B on condition, then the bounded intset on true branch is
`Intersect(A, B)` and the bounded intset on false branch is `Difference(A, B)`.
- Clear read/write annotation for non-opaque block in pass
`ConvertBlocksToOpaque` (maybe illness)
`ConvertBlocksToOpaque` is the pass before `CompactBufferRegion`. Since
block read/write annotations are non-conditional, the conditional access info
will get lost when block with point access is converted.
```python
for i in range(20):
with T.block():
T.reads([B[i-2]])
T.evaluate(T.if_then_else(2 <= i and i < 18, B[i - 2], 0.0,
dtype="float32"))
```
For a buffer allocated out of block scope, the compact buffer pass
currently do not look into detailed access in block scope but use block's
annotations, thus still try to relax B[i - 2] without condition awareness. The
PR try to overcome this problem by differentiate two circumstances:
- (1) the block is opaque with reads/writes annotations: treat the
block as opaque as before, just use annotations.
- (2) the block is opaque but reads/writes are empty: treat the block as
"transparent" and try visit buffer accesses within block.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]