Lunderberg commented on PR #12949: URL: https://github.com/apache/tvm/pull/12949#issuecomment-1264013889
Rats. I figured it was worth a shot. Two additional questions coming to mind: 1. Would it work to construct a `DLTensor` of shape `[1]`, then iterate over `DLTensor::byte_offset` for each element? I *think* that would work within the public methods using `DLTensor`, but the per-element overhead might be large. 2. If there the per-element overhead of a virtual function call is large, would it be worth using `DetectIterMap` to attempt to regions that are contiguous in both source and destination layout, in order to copy those entire regions? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
