Lunderberg commented on PR #16938:
URL: https://github.com/apache/tvm/pull/16938#issuecomment-2080617986
Agreed, there's a tradeoff, and replacing copies with views isn't something
to be applied in every case.
* Kernel Compatibility
Currently, the default `tir::Buffer` used by TE (used by topi, and used in
most Relax legalizations) is defined with `elem_offset` of zero.
* Option 1: Change TE's default to use arbitrary `elem_offset` for the
default `tir::Buffer`. (Change the default of `elem_offset` from zero to a
fresh TIR variable.) Not a good option, as this would provide a weaker
alignment check that existing kernels may assume.
* Option 2: Change TE's default to use a dynamic `elem_offset` that
preserves alignment for the default `tir::Buffer`. (Change the default of
`elem_offset` from zero to a fresh TIR variable, and change the default of
`offset_factor` to `kAllocAlignment // dtype.bytes()`.) This is better than
option (1), because `data + elem_offset*dtype.bytes()` is still aligned.
However, still not a good option, as TIR kernels that call external functions
typically pass `buf.data` as an argument, which doesn't include the
`elem_offset`.
* Option 3: Keep TE's default the same, and instead apply a
post-processing step. For each `tir::Buffer` argument with `elem_offset` of
zero, replace it with a new `tir::Buffer` with `elem_offset` allowed, and
`offset_factor` to maintain alignment. Within the body of the function, add a
definition of the old `tir::Buffer` argument either as a `MatchBufferRegion` or
a `attr::buffer_bind_scope`.
I'm liking Option 3 the most, as it adds flexibility at the calling
scope. Unlike option 1, it still provides the same alignment guarantees for
kernels. Unlike option 2, it wouldn't require re-writing a very large number
of existing kernels.
* Performance
* Zero-Copy vs Contiguous Arrays: The `CreateView` method is only
allowed to make contiguous views of contiguous arrays. It is not able to make
strided views, so this tradeoff isn't applicable.
* Zero-Copy vs Alignment: The `CreateView` method can make views whose
`data + byte_offset` isn't aligned to
`kAllocAlignment`. The restriction of aligned `data + byte_offset` would
still be applied at the `PrimFunc` level, so this wouldn't cause a performance
hit.
* Zero-Copy vs Memory Footprint: The `NDArray` returned by `CreateView`
shares the same reference-counted allocation as the source array into which it
views. If the view outlives the source, then the memory footprint of the view
is larger than a copy would have been.
Not an issue for the `CreateView` method itself, but should be a
consideration for optimization passes that replace copies with views. (e.g.
replacing `R.take`, `R.split`, or `R.strided_slice` with a view) Should only
be done in cases where the source array outlives the view.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]