vvchernov opened a new pull request #8781: URL: https://github.com/apache/tvm/pull/8781
WORK IN PROGRESS: The following issues were observed during testing: 1. Current onnx frontend implementation with activation fix fails accuracy test for stacked modification after tuning. There is no issue without tuning and for stacked bidiractional one (or may be not observed)! 2. New pytorch GRU has the same problem. Note: onnx and pytorch GRU close but different implementations. 3. More advanced implementation of GRU is faster on more than 20% but has stable accuracy failures for tvm tuning and no problem without it. It is implemented locally while issue is not resolved. 4. Onnx GRU has issues related wrong implementation but unit test checks accuracy and does not catch it. Need more detail investigation, potentially it can be big problem. 5. Strong difference of performance results for ONNX and new pytroch GRU. Adding new implementation to ONNX frontend instead of current one is planned. GRU cell was unified and implemented in common.py. It is used by pytorch frontend of TVM. Critical fix was done on ONNX GRU implementation side. Performance tests for different modification of GRU before and after new implementation were carried out. The results are collected in the table: Table 1. Average time per run (microsec) for 10000 runs. The following parameters are used (small input size): with biases = True, batch first = True, linear before reset = True, feature size = 5, hidden size = 10, number of stacked layers = 2, sequence length = 3, batch size = 1, trials number = 100. TVM target is “llvm -mcpu=core-avx2” | Frontend name/GRU type | uni | b | s | sb | | :-----------------------------| :-----:|:------:|:-------:|:-------:| |Onnx | 23.3 | 48.6 | 46.4 | 100.8 | |Onnx tuned | 8.37 | 15.25 | 14.35 | 28.1 | Pytorch implemented | 7.42 | 13.0 | 12.8 | 24.7 | Pytorch impl tuned | 3.2 | 4.91 | 4.77 | 8.43 | Onnxruntime | 13.06 | 16.89 | 19.2 | 25.9 | There are several GRU types: uni – unidirectional, b – bidirectional, s – stacked (2 layers are used in the tests), sb - stacked bidirectional. Compiled by TVM pytorch GRU is faster than tuned ONNX GRU and onnxruntime! Tuned pytorch is strongly quicker. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
