sergey-grovety opened a new pull request #9233: URL: https://github.com/apache/tvm/pull/9233
# TVM operations implementation using cortex -M7 SIMD instructions ### nn.conv2d - We added the implementation of gemm function with 16-bit input - shape[-1] multiple of 4 restriction is resolved - There is data preparing before 8-bit intrinsic, such preparation will consume too much time in case of small tensor, so we add a check and simple cycle to handle this specific situation - In terms of optimization - calculations moved from inside of the intrinsic to outside - One of the buffers was radically cut(wasn't in use), lead to reducing memory requirements mostly in a half ### nn.max_pool2d - Implemented with __SSUB8 and __SEL intrinsics for four 8-bit input values, which is lead to notable acceleration - Feature: implementation ready for not 1word-aligned input data - Feature: ready for data sizes not a multiple of 4 - memset is used to initialize the minimum values, to provide max speed ### nn.avg_pool2d - Due to lack of sum of four 8-bit values intrinsic - implementation could be possible only for 16-bit data - __SMLAD intrinsic used to process two 16-bit values - Feature: implementation ready for not 1word-aligned input data - Feature: ready for data sizes not a multiple of 4 ### nn.dense Implemented with same gemm method, described above ### nn.conv1d Specific case of gemm usage - with one of data dimensions equal to 1 ### nn.avg_pool1d Implemented for NCW layout with same intrinsic as for 2d version of operation ### nn.max_pool1d Implemented with same intrinsic as for 2d version of operation # Benchmarking: To enable intrinsic code generation you should specify _-march=armv7e-m _ flag HW platform: STM32F746 Nucleo; GCC10, optimization flags: -O3 If you want to enable intrinsic you should specify -march parameter of the target: `target_str = f"c -keys=arm_cpu -mcpu=cortex-m7 -march=armv7e-m -model=stm32f746xx -runtime=c -link-params=1 --executor=aot --unpacked-api=1 --interface-api=c"` ## Results ms | No Intrinsic | Intrinsic enabled -- | -- | -- mnist8 | 8.625 | 6.574 cifar10 | 788.36 | 144.59 </div><ul><li>No intrinsic: march parameter not specified, no code generated for Intrinsic</li><li>Intrinsic enabled: <em>march=armv7e-m</em></li></ul> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
