u99127 commented on pull request #7048:
URL: https://github.com/apache/tvm/pull/7048#issuecomment-742526188


   > 
   > 
   > > Thanks @ANSHUMAN87. Overall looks pretty good, but I am a little 
confused at exactly what is happening is this pass. Is it converting matrices 
from sparse to dense at compile time?
   > 
   > Thanks @tkonolige for review! The Densify Op does not need to be present 
in TVM runtime currently, because its purpose is to convert sparse 
weights(constant) to dense form. So i have optimized it out. However in case it 
is required in future to be present in TVM runtime, we can always do it with 
sparse_to_dense Op with the indices produced from the utility 
prepare_dense_matrix_from_sparse().
   > 
   Sorry about the delay in reviewing .  
   
   I think there are a couple of design points here that we should consider and 
take a well-informed decision on this direction. 
   
   1. The advantage of keeping the sparse weights representation all the way 
through is something that helps with keeping the weights size low in the final 
binary generated which could help with static size / footprint in the final 
binary at the expense of some runtime (obviously). For usecases where we go 
through this with BYOC or others where there are weight compression techniques, 
this should clear up reasonably ok, however where we fall back to traditional 
tvm compilation on CPUs and GPUs there's no weight compression techniques and 
thus this contributes to rodata bloat or flash bloat. 
   
   2. Even if it doesn't make sense to keep this in the final representation to 
be expanded at the runtime either in the form of an op for the Relay VM or a 
new op in the graph runtime - is this operator something that appears in other 
frameworks and if so does it make sense to keep this behaviour common across 
the stack to allow us to reuse this logic for other frontends as well i.e. the 
various frontends lower to a relay.op.densify () and the result of that is 
something that represents a dense weight tensor. Further legalization could end 
up creating this dense weight tensor ? One of the principles in compiler design 
is to try and bind things as late as possible to give us a chance of optimizing 
and being able to recover easily without too much reverse engineering. So this 
comes back to that as well. 
   
   TBH I would probably prefer #2 as an approach to try and reduce any 
duplications in the frontends and give other compilers or others who use Relay 
a chance to get to the densify op. I don't think we can use it for any of our 
work instantly. If the performance or code size becomes an issue we'll need to 
reconsider a runtime or partial runtime representation of sparse weights at 
which point #1 would be a stepping stone towards it .
   
   These are my thought on the design. 
   
   regards
   Ramana
   
   > As to your other comments i have tried to add comments in the necessary 
places. Please have a look. In case anything more required. Please let me know. 
TIA!
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to