ANSHUMAN87 edited a comment on pull request #7048:
URL: https://github.com/apache/tvm/pull/7048#issuecomment-742610846


   > > 
   > 
   > Sorry about the delay in reviewing .
   > 
   > I think there are a couple of design points here that we should consider 
and take a well-informed decision on this direction.
   > 
   >     1. The advantage of keeping the sparse weights representation all the 
way through is something that helps with keeping the weights size low in the 
final binary generated which could help with static size / footprint in the 
final binary at the expense of some runtime (obviously). For usecases where we 
go through this with BYOC or others where there are weight compression 
techniques, this should clear up reasonably ok, however where we fall back to 
traditional tvm compilation on CPUs and GPUs there's no weight compression 
techniques and thus this contributes to rodata bloat or flash bloat.
   > 
   >     2. Even if it doesn't make sense to keep this in the final 
representation to be expanded at the runtime either in the form of an op for 
the Relay VM or a new op in the graph runtime - is this operator something that 
appears in other frameworks and if so does it make sense to keep this behaviour 
common across the stack to allow us to reuse this logic for other frontends as 
well i.e. the various frontends lower to a relay.op.densify () and the result 
of that is something that represents a dense weight tensor. Further 
legalization could end up creating this dense weight tensor ? One of the 
principles in compiler design is to try and bind things as late as possible to 
give us a chance of optimizing and being able to recover easily without too 
much reverse engineering. So this comes back to that as well.
   > 
   > 
   > TBH I would probably prefer #2 as an approach to try and reduce any 
duplications in the frontends and give other compilers or others who use Relay 
a chance to get to the densify op. I don't think we can use it for any of our 
work instantly. If the performance or code size becomes an issue we'll need to 
reconsider a runtime or partial runtime representation of sparse weights at 
which point #1 would be a stepping stone towards it .
   > 
   > These are my thought on the design.
   > 
   > regards
   > Ramana
   
   Thanks @u99127 for your inputs! Most of your arguments i concur. 
   I think we have 2 possible approaches to bring Sparse Convnet models onto 
TVM:
   1:> Keep the sparse weight and convert to dense in runtime, as Sparse 
inference is not complete in TVM currently.
   2:> Optimize out the Sparse weights, by transforming to Dense while 
on-boarding to TVM.
   
   When i went through first approach, i found the performance is too bad and 
Sparse_to_Dense Op resulted in stack corruption when fed with higher dimension 
inputs, hence i switched to second approach, which makes sense if we think from 
TFLite perspective. Because TFLite framework, optmizes  out Densify Op when it 
comes to Sparse Inference. As my future work will include the Sparse Inference 
as well, i think at this point it will suffice to keep densified weight in 
Runtime.
   
   However as i mentioned earlier, if we want to keep in TVM runtime, current 
PR changes supports that as well, with the steps i mentioned in my previous 
comment. Provided we first fix the Sparse_to_Dense limitations(which i am 
looking into right now).
      I think keeping a Densify Op is not a good option as it will be duplicate 
of Sparse_to_Dense Op.
   
   May be we can do this Op conversion with a configurable user option in 
Frontend.
   Now by default we optimize out this Op and keep the dense weight in Runtime.
   When i am ready with an appropriate solution for Sparse_to_Dense Op, we can 
make that as default and keep the first approach to user choice, so that user 
can chose between performance or compression.
   Let me know your thoughts on this. TIA!
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to