altanh opened a new pull request, #11531:
URL: https://github.com/apache/tvm/pull/11531

   This PR adds a TE implementation of LSTM (with optional modifications, 
similar to those in 
https://github.com/apache/tvm/blob/main/python/tvm/relay/frontend/common.py#L774),
 using the `te.scan` construct (so that the recurrent loop is truly a 
sequential loop, rather than unrolled statically). This compute should support 
symbolic sequence length.
   
   Missing from this PR:
   - optimized schedule for any target
   - corresponding higher-level Relay op
   - attempts to use metaschedule (more on this later)
   
   I'll send a follow-up PR for the Relay op, but scheduling the LSTM might 
take a while (if anyone is interested, please feel free to take a stab!). The 
main thing to optimize is the dense operations within the kernel (the initial 
input-hidden dense, recurrent hidden-hidden dense, and hidden-projection 
dense). I couldn't figure out a great way to use existing schedules here...
   
   Things I am hoping to try:
   - Fix some variant of LSTM and write an S-TIR kernel for it, then try to 
schedule individual blocks (maybe reusing existing stuff if possible). Because 
LSTM has a lot of optional stuff, I'm not sure how easy it would be to do 
tvmscript-level metaprogramming to inject optional computations etc.
   - Once the Relay op is up, add a cuDNN strategy as an option for NVIDIA gpus
   
   Regarding metascheduling: the current `CreatePrimFunc` conversion from TE -> 
S-TIR doesn't support scan operations. I have a hack that makes this conversion 
work, but am hitting some snags regarding schedule rules, primitives, and post 
procs (the outer scan axis seems to break a lot of assumptions). I can try to 
clean up this conversion if that's valuable, but also am curious if anyone is 
interested in tackling this by adjusting the constraints on blocks to support 
outer scan axis.
   
   cc @vinx13 @junrushao1994 @tkonolige @michalpiszczek @masahi 
   
   Additional thanks to @vinx13 and @zxybazh for helping debug metaschedule 
issues (I hope this PR helps as a concrete starting point for getting things 
working), maybe you guys can cc others who may be interested? And thanks 
@junrushao1994 for the very helpful LSTM example from ~5 (!) years ago 
https://github.com/apache/tvm/blob/main/apps/topi_recipe/rnn/lstm.py which I 
used as a starting point.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to